Code with Claude SF: Managed Agents and the Build-vs-Buy Call

Anthropic ran Code with Claude San Francisco on May 6, 2026. Five things shipped: Managed Agents (the harness becomes the product), Dreaming (between-session memory consolidation), Outcomes (self-grading rubrics), multi-agent orchestration, and Claude Finance with 10 pre-built agents. The honest read is that Anthropic has decided the harness is the product — not the model. For Indian teams that have spent the last 18 months building their own agent harness on top of the API, this is a build-vs-buy fork. Here is the matrix we are using with clients this week to decide. ## TL;DR — When does Anthropic Managed Agents make sense? Buy Managed Agents if you're a team under 8 engineers AND your use case is well-bounded (customer support, finance ops, document workflows) AND you'll run agents long enough that runtime ($0.08/session-hour) is cheaper than DIY infrastructure. Build your own harness if you have ≥10 engineers, multi-cloud requirements, sub-second latency targets, or you need vendor-portable abstractions (run the same harness on Anthropic, OpenAI, and Gemini).

$0.08

Per Session-Hour Runtime Fee

Pre-Built Claude Finance Agents

5h × 2

Pro/Max Rate Limit Doubled

220K+

SpaceX Colossus 1 GPUs Partnership

## Why this matters now (May 2026) Three product moves landed in the same week. First, Anthropic Managed Agents bills on three axes — tokens (same as before), session runtime ($0.08/hour billed to the millisecond), and tool-triggered costs (web search $10/1,000 queries) — which is a new billing dimension for any team that was on pure-token pricing. Second, Dreaming (hippocampal memory consolidation between sessions) makes long-running agents materially more capable, but only if you stay inside Anthropic's harness. Third, OpenAI countered the same week with their open-source Agents SDK supporting seven sandbox providers — no first-party runtime fee. The market just split into "buy the integrated stack" vs "build the portable stack." ## What Managed Agents actually is Managed Agents is a beta API where you define an agent (tools, prompts, guardrails), and Anthropic runs the execution environment — long-running sessions, sandboxed code execution, scoped permissions, end-to-end tracing, and MCP-based tool connections. You write the agent spec; they run the loop.

🏗️

Managed runtime

Anthropic runs the agent loop, not you. No EC2, no Lambda, no container orchestration. Trade-off: you don't control the runtime layer.

💤

Dreaming (memory consolidation)

Between sessions, the agent reviews past work, pulls patterns, writes new memory entries. Like a brain replaying the day during sleep. Materially better on repeating jobs.

🎯

Outcomes (self-grading rubric)

A separate evaluator agent scores the worker agent's output against a written rubric and tells it what to fix. Closed-loop quality improvement.

🔀

Multi-agent orchestration

A lead agent fans work out to specialist subagents running in parallel. Useful for research, multi-document analysis, code refactors across many files.

## The build-vs-buy decision matrix We use this with clients after a 60-minute discovery call. Pick the row that matches your team size, then check whether the use case is bounded.

Team size	Bounded use case	Multi-vendor needed	Verdict
Solo founder / 1-3 eng	Yes	No	Buy Managed Agents
Solo founder / 1-3 eng	No (custom)	No	Buy + customize via MCP
4-8 engineers	Yes	No	Buy Managed Agents
4-8 engineers	Yes	Yes (also GPT)	Build portable harness
9-25 engineers	Mixed	Likely	Build (with MCP for portability)
25+ engineers	No	Yes	Build (multi-cloud, multi-model)
Any size — finance/treasury	Yes	No	Buy Claude Finance (pre-built)
Any size — <5 sec latency SLA	Any	Any	Build (managed adds latency)

## The cost math: when does $0.08/session-hour actually matter A "session" in Managed Agents is the time a workflow is alive — from invocation to completion. If your agent runs for 12 minutes per session, that is $0.016 (~₹1.35) per session in runtime, on top of token costs. Multiply by your daily session count. For our Hyderabad SaaS client running ~3,000 sessions/day averaging 8 minutes each: 400 session-hours/day × $0.08 = $32/day = ~₹2,720/day = ~₹81,600/month in runtime fees alone, on top of ~₹4.2L/month in token costs. For comparison, their self-built harness on AWS ECS with the same workload costs ~₹14,000/month in compute + storage. Managed Agents is ₹67,000/month more expensive for this workload — but it removes ~30 hours/month of engineering time on container ops, retries, monitoring. At their engineer cost (~₹3,500/hour fully loaded), that is ₹1.05L/month of recovered engineering time. Net: Managed Agents costs ₹67K more in infrastructure but saves ₹1.05L in engineering time. The buy decision wins by ~₹38K/month for this client. Different clients flip the other way at different scales. The crossover point in our experience sits between 9 and 12 engineers. Below that, buy wins. Above that, build wins because per-engineer ops savings stop compounding. ## The 3 hidden gotchas before you buy Things we have learned across 5 client deployments on Managed Agents beta. Gotcha 1: Session billing is granular but unpredictable. A "session" can stretch from 90 seconds to 6 hours depending on what the agent does. Budget projections need P95 session length, not average. We've seen 30% overruns on initial estimates because finance teams plan on averages. Gotcha 2: Tool costs stack fast. Web search at $10/1,000 queries adds up when an agent does 8–15 searches per session. A 1,000-session/day workload with 10 searches each is $100/day in web search alone — ₹2.5L/month in addition to token + runtime. Gotcha 3: Vendor lock-in is real. The Dreaming memory consolidation, the Outcomes rubric loop, the multi-agent orchestration — these are Anthropic-specific. You cannot easily port them to GPT-5.5 or Gemini. If you decide in 18 months to multi-vendor, you'll rewrite the agent layer. Build your portable abstractions early or accept the lock-in honestly.

The OpenAI counter: OpenAI's Agents SDK supports seven sandbox providers, has no first-party runtime fee, and remains open source. Same week as Code with Claude SF, this is the explicit "build" alternative. For multi-vendor strategies, OpenAI's SDK + your own runtime is the path. Per The New Stack's analysis: "Anthropic, OpenAI, Google and Microsoft agree the harness is the product. They disagree on the price."

## When to buy Claude Finance specifically Claude Finance ships with 10 pre-built agents — accounts payable, accounts receivable, financial close, treasury, expense review, and others. For an Indian SMB or mid-market company running a small finance team, this is genuinely valuable. The agents come with audit logging, dual-control workflows, and compliance hooks that you would otherwise build yourself. When does it make sense? If you process ≥1,000 invoices/month, run a finance team of 3–8 people, and use Tally or Zoho Books, the pre-built agents save weeks of integration work. The cost runs ~₹85,000–₹2.4L/month per agent depending on volume, but replaces 0.5–1 full-time analyst per agent. When does it NOT make sense? If your accounting stack is unusual (Marg, Busy, or custom ERPs common in Indian manufacturing), the pre-built integrations don't help. If your volume is <500 invoices/month, the FTE math doesn't work. ## The 8-step path to a buy/build decision This is the workflow we run with clients in week one of an engagement.

Count engineers who would touch the agent layer

Not total team size — engineers who would actually build/maintain/operate the agent harness. Most teams overcount here.

Estimate sessions/day and avg session duration

Pull from logs if you have an existing system. For greenfield, estimate from user volume × interactions/user/day. Use P95 duration for cost math, not mean.

Project tool-use costs separately

Web search, code execution, file system, custom tools. Each has its own billing. Project monthly cost per tool type.

Check multi-vendor requirement honestly

Are you actually multi-vendor today, or do you say you are? If GPT-5.5 is <15% of total spend, you're not really multi-vendor — single-vendor is OK.

Build the 6-month cost projection both ways

Managed Agents: tokens + runtime + tools. DIY: infra + engineer time amortized. Don't forget reliability cost — count P95 SLO breach impact.

Run a 2-week pilot on whichever path you favor

Pick one workflow. Build it both ways if you have time. Most teams pilot Managed Agents because it's faster to ship — that bias is real.

Measure on quality, latency, cost, engineer time

All four. Most decisions go wrong because someone optimized for just one (usually cost).

Commit to the choice for at least 6 months

Switching is expensive. If you choose Managed Agents, commit. If you build, commit. Decision-flipping kills more agent projects than wrong-initial-choice.

## When NOT to use Managed Agents (3 honest cases) Sub-second latency budgets. Managed Agents adds ~200–800ms of orchestration overhead vs a tightly tuned self-hosted harness. For voice agents or real-time UX, this is a deal-breaker. Build. Heavy custom tooling outside MCP. If you depend on dozens of internal tools that don't have MCP servers and never will, you'll spend more time writing MCP adapters than you'd save on managed runtime. Build. Data sovereignty / on-premise requirement. Some BFSI and government clients in India require all execution inside their own VPC. Managed Agents executes in Anthropic's cloud. Hard no for those workloads. Build. ## Real example: a Pune CFO's call we just took A 110-person Pune SaaS company processes ~1,800 customer-success conversations a month. They wanted an agent to draft follow-up emails, log activities in their CRM, and flag at-risk accounts. 4-engineer team. No multi-vendor today. Decision: buy Managed Agents. Reasons: (a) team too small to operate a harness, (b) use case bounded — three clear workflows, (c) Anthropic's Dreaming feature was genuinely useful for "remember the customer's history" use case, (d) cost projection ₹2.4L/month vs ₹3.8L/month all-in for DIY (engineer time dominated). 3-week pilot validated the call. Now in production at ~₹2.2L/month, ~92% draft quality, ~30 hrs/month of CSM time freed. Worth it. Different client at 25-engineer size with multi-vendor strategy = different decision. ## The checklist before signing the Managed Agents contract

You have under 10 engineers AND a bounded use case
You don't need sub-second latency
You're not committed to multi-vendor (single-vendor is the honest answer)
Your data residency requirements allow Anthropic's cloud execution
You've projected costs including session runtime AND tool-use AND tokens
You've allocated 15–25% budget cushion for first-month overruns
You have a fallback plan if the beta has reliability issues
Your finance team understands "session-hour" billing (not per-API-call)

## FAQ ### What is the actual price of Managed Agents in May 2026? Standard model token rates (Opus 4.7: $5/M input, $25/M output), plus $0.08/session-hour runtime, plus tool costs (web search $10/1,000 queries). Custom tools vary. Budget 15-30% above raw token cost for managed overhead. ### Can I use Managed Agents with GPT-5.5? No. Managed Agents is Anthropic-specific. For multi-vendor, use OpenAI's Agents SDK with your own runtime (or use both side-by-side for different workloads). ### What's Dreaming actually doing? A scheduled process between agent sessions that reviews past sessions, pulls patterns, writes new memory entries. Materially improves repeating-job quality. Only useful if you run the same agent repeatedly on similar tasks. ### Is the Outcomes self-grading rubric worth it? For high-stakes workflows (finance, compliance, customer-facing), yes — the additional eval cost (~30% more tokens per task) catches quality regressions. For exploratory workflows, no — adds latency without proportional value. ### Does Managed Agents support MCP? Yes — MCP is the connection layer for tools. You define your tools as MCP servers; Managed Agents calls them. This is the only portable piece — your MCP servers work the same way if you later migrate to a DIY harness. ### What's the SLA? Anthropic publishes 99.9% for the API. Managed Agents is beta as of May 6, 2026; no formal SLA yet. Build retry logic, expect occasional regressions. ### How does this compare to LangChain or CrewAI? LangChain/CrewAI are open-source frameworks you self-host. Managed Agents is a fully hosted product. If you're already on LangChain and it works, switching to Managed Agents trades portability for managed-ops convenience. Most LangChain teams should evaluate but not rush to switch.

Want Help Picking the Right Agent Platform?

We run a 60-minute build-vs-buy workshop with your team — your workloads, your cost projections, your multi-vendor situation. Output: a written recommendation (buy / build / hybrid) with 6-month cost model in INR. Typical engagement: ₹85,000 fixed for the workshop + report. Suitable if you have an active agent project, are deciding between Anthropic Managed Agents, OpenAI Agents SDK, and self-built harness, and want a vendor-neutral read.

Book the Workshop

Tags:

AnthropicManaged AgentsAI AgentsCode with ClaudeBuild vs BuyAgent HarnessMCP

Share this post:

Hrishikesh Baidya

CTO at Softechinfra specializing in Python, system architecture, and building secure, scalable software solutions.

Back to Blog

Team size

Bounded use case

Multi-vendor needed

Verdict

Solo founder / 1-3 eng

Yes

Buy Managed Agents

Solo founder / 1-3 eng

No (custom)

Buy + customize via MCP

4-8 engineers

Yes

Buy Managed Agents

4-8 engineers

Yes

Yes (also GPT)

Build portable harness

9-25 engineers

Mixed

Likely

Build (with MCP for portability)

25+ engineers

Yes

Build (multi-cloud, multi-model)

Any size — finance/treasury

Yes

Buy Claude Finance (pre-built)

Any size — <5 sec latency SLA

Any

Build (managed adds latency)

Code with Claude SF: Managed Agents and the Build-vs-Buy Call

Want Help Picking the Right Agent Platform?

Hrishikesh Baidya

Related Posts

Night Before Google I/O 2026: 5 Things Indian Builders Should Watch

The IELTS Speaking Rubric Just Shifted. Here's How We're Updating TalkDrill

GPT-5.5 Dropped: 4 Tasks It Beats Claude Opus 4.7 (We Tested 12)

Want More Insights?

Code with Claude SF: Managed Agents and the Build-vs-Buy Call

Want Help Picking the Right Agent Platform?

Hrishikesh Baidya

Related Posts

Night Before Google I/O 2026: 5 Things Indian Builders Should Watch

The IELTS Speaking Rubric Just Shifted. Here's How We're Updating TalkDrill

GPT-5.5 Dropped: 4 Tasks It Beats Claude Opus 4.7 (We Tested 12)

Want More Insights?