What is the difference between Claude Managed Agents and Claude Code?

Claude Code is the developer CLI you drive interactively from your own machine. Claude Managed Agents is the API-and-Console surface for agents that run on Anthropic's infrastructure, on a schedule or on demand, without a human watching each step — with platform features like Outcomes grading, multiagent orchestration, dreaming, and webhooks layered on top.

What is Multiagent Orchestration in Claude?

Multiagent Orchestration lets a lead agent break a job into pieces and delegate each to a specialist with its own model, prompt, and tools, with specialists running in parallel on a shared filesystem and contributing back to the lead agent's context. The whole flow is traceable in the Claude Console. Anthropic cites Netflix's platform team building a log-analysis agent over hundreds of builds, and Spiral pairing orchestration with Outcomes. It is in public beta.

What is Dreaming in Claude Managed Agents?

Dreaming is a scheduled process that reviews an agent's past sessions and memory, extracts patterns, and curates memory so the agent improves between runs — surfacing recurring mistakes, convergent workflows, and shared team preferences, and either updating memory automatically or holding changes for review. It is in research preview; Anthropic cites a Harvey pilot where completion rates went up roughly six times in their tests, which is a sourced vendor anecdote rather than a published benchmark.

How should a team start building on Claude Managed Agents?

Start with one workflow and the one feature that matches its bottleneck — Outcomes if quality is the problem, orchestration if scope is, sandboxes if data can't leave. A practical cadence is a 2-week build sprint to wire that single workflow with a real rubric or a clean orchestration boundary, then a 2-week test-and-iterate cycle against live work before adding the next piece. Skip 60-to-90-day pilots; they are old-school for this kind of build.

Claude Anthropic Claude Managed Agents AI Agents Multiagent Orchestration Agent Evaluation MCP

How Claude Managed Agents Actually Work: Dreaming, Outcomes, Multiagent Orchestration, and Webhooks (2026)

Q: How do Outcomes work in Claude Managed Agents?

With Outcomes you write a rubric describing what success looks like, and a separate grader evaluates the agent's output against that rubric in its own context window, isolated from the agent's reasoning. When the work falls short, the grader pinpoints what to change and the agent revises and tries again until it clears the bar. Anthropic reports the loop improved task success by up to 10 points over a standard prompting loop in internal testing, with +8.4% on docx and +10.1% on pptx tasks. It is in public beta.

Q: Are Claude Managed Agents secure enough for regulated data?

Self-hosted sandboxes (public beta) and MCP tunnels (research preview) are designed for exactly that case. The agent loop stays on Anthropic's infrastructure while tool execution runs in your own environment or a managed provider like Cloudflare, Daytona, Modal, or Vercel, so files and repositories don't leave your perimeter. MCP tunnels let agents reach private MCP servers over a single outbound connection with no inbound firewall rules and end-to-end encryption. Whether that meets your bar still depends on your own review.

Q: Did Anthropic double Claude Code's rate limits?

Yes. At Code w/ Claude, Anthropic doubled Claude Code's five-hour rate limits for Pro, Max, Team, and seat-based Enterprise plans, removed the peak-hours reduction for Pro and Max, and raised API limits for Claude Opus. The free plan was not upgraded. Reporting put the Tier 1 API increase at roughly fifteen times maximum input tokens per minute and nine times output.

Marco Lobo

25 May 2026·Updated 25 May 2026·9 min read

TL;DR

Claude Managed Agents is Anthropic's hosted layer for running agents that improve over time. At Code w/ Claude in San Francisco (May 2026) it gained four mechanics: Dreaming, Outcomes, Multiagent Orchestration, and Webhooks.

The one that changes how you build is Outcomes: a separate grader evaluates the agent's work in its own context window, isolated from the agent's reasoning, and sends it back to revise until a rubric is met. Anthropic reports up to a 10-point task-success lift over a plain prompting loop.

The features are the easy part. Writing a rubric that captures "good," drawing clean orchestration boundaries, and wiring webhooks, sandboxes, and tunnels into systems you already run is the work a launch leaves to you.

Anthropic spent its San Francisco developer conference making one argument: the distance between an idea and working software is narrowing, and the teams getting leverage are the ones designing for that rather than reacting to it. The most concrete evidence was a set of new capabilities for Claude Managed Agents — the platform layer that runs agents on Anthropic's infrastructure rather than on your laptop.

Four mechanics landed together. Each is useful on its own. Read together, they describe a single shift: agents that grade their own work, split it across specialists, remember what worked, and tell you when they are done.

What are Claude Managed Agents?

Claude Managed Agents is Anthropic's hosted platform for running self-improving agents, where Anthropic operates the agent loop — orchestration, context management, and error recovery — while you define the task, the tools, and the criteria for success. It is distinct from Claude Code, the developer CLI: Managed Agents is the API-and-Console surface for agents that run server-side, on a schedule or on demand, without a human watching every step.

The four features announced on 6 May 2026 break down like this:

Feature	What it does	Stage (May 2026)	Reach for it when
Dreaming	A scheduled job reviews past sessions and memory, extracts patterns, and curates what the agent remembers	Research preview	Many similar runs repeat the same mistakes or rediscover the same workflow
Outcomes	A separate grader scores output against your rubric in its own context window and loops the agent until it passes	Public beta	Quality is hard to hit in one pass but easy to describe in a rubric
Multiagent Orchestration	A lead agent splits work and delegates to specialists, each with its own model, prompt, and tools, in parallel on a shared filesystem	Public beta	The job is too big or too varied for one context window
Webhooks	Define an outcome, let the agent run, and get notified when it finishes	Public beta	Tasks run long enough that babysitting them is the real cost

How does Dreaming work, and when should you use it?

Dreaming is a scheduled process that reviews an agent's past sessions and memory stores, extracts patterns, and curates memory so the agent improves between runs. Anthropic describes it as surfacing what a single session cannot see on its own: recurring mistakes, workflows that agents converge on, and preferences shared across a team. It can update memory automatically or hold changes for review before they land.

Dreaming is in research preview. Anthropic cites the legal-AI company Harvey, whose completion rates "went up ~6x in their tests" during the pilot. Treat that the way you would any vendor pilot number reported through the vendor's own blog: a sourced anecdote from one team's tests, not a published benchmark with a baseline and a methodology. The durable part is the mechanism — periodic, structured memory curation instead of an ever-growing, ever-noisier memory file.

Reach for Dreaming when you run many similar agent sessions and the bottleneck is drift: the agent keeps relearning the same lessons, or its memory has thickened into noise. Leave it off when sessions are one-offs with no pattern worth distilling.

How do Outcomes work, and why does a separate grader matter?

Outcomes lets you write a rubric describing what success looks like, then runs a separate grader that evaluates the agent's output against that rubric in its own context window — "so it isn't influenced by the agent's reasoning," in Anthropic's words. When the work falls short, the grader pinpoints what to change and the agent takes another pass, repeating until it clears the bar.

The detail that matters is the separation. An agent grading its own work is the same model that just produced it, primed by its own reasoning to believe it is finished. A grader in a clean context window carries no such prior. Splitting generation from evaluation is the most reliable pattern there is for making agents trustworthy, and Outcomes turns it into a platform primitive instead of something every serious team rebuilds by hand.

Anthropic reports the lift from internal testing: task success improved by up to 10 points over a standard prompting loop, with the largest gains on the hardest problems, and specific results of +8.4% on docx and +10.1% on pptx tasks. Those are Anthropic's own numbers, so weight them as directional — but the direction matches what eval-driven teams already see when they stop trusting a single pass.

Outcomes is in public beta. Reach for it when "good" is hard to produce in one shot but easy to describe. Skip it when you cannot write the rubric, because a vague rubric produces a grader that waves everything through.

How does Multiagent Orchestration work?

Multiagent Orchestration lets a lead agent break a job into pieces and delegate each one to a specialist with its own model, prompt, and tools, with the specialists working in parallel on a shared filesystem and contributing back to the lead agent's context. Persistent events let the lead agent check in mid-workflow, and the whole flow is traceable in the Claude Console — which agent did what, and why.

Anthropic's worked example is an investigation: a lead agent runs the case while subagents fan out across deploy history, error logs, metrics, and support tickets at the same time. It names Netflix's platform team, which built a log-analysis agent processing hundreds of builds, and Spiral, which pairs orchestration with Outcomes to enforce writing quality.

It is in public beta. Reach for it when the work is genuinely too large or too varied for one context window — parallel research, multi-source investigation, jobs where different sub-tasks want different models. The trap is using it when one well-scoped agent would do: every boundary you draw between agents is a place where context can drop, so orchestration pays off only when the parallelism is real.

What do Webhooks add?

Webhooks let you define an outcome, start the agent, and receive a notification when the run finishes — so long-running work does not need a human watching it. Paired with Outcomes, it closes a loop: define the target result, let the agent grade-and-revise its way there, and get pinged when it clears the bar. It is in public beta. The pattern it unlocks is fire-and-verify — agents that own a task end to end and report back, rather than ones you sit beside.

Where do self-hosted sandboxes and MCP tunnels fit?

Two weeks later, at the London event on 19 May 2026, Anthropic shipped the features that make Managed Agents viable for teams that cannot send data out: self-hosted sandboxes and MCP tunnels. They share one design idea — the managed split. The agent loop that handles orchestration, context management, and error recovery stays on Anthropic's infrastructure, while the parts that touch your data move into your environment.

With self-hosted sandboxes (public beta), tool execution runs on infrastructure you control — your own, or a managed provider like Cloudflare, Daytona, Modal, or Vercel — so files and repositories never leave your perimeter, and you set the compute and the runtime image. With MCP tunnels (research preview), agents reach MCP servers inside your private network without exposing them to the internet: a lightweight gateway you deploy makes a single outbound connection, with no inbound firewall rules, no public endpoints, and traffic encrypted end to end. Tunnels work in Managed Agents and the Messages API, and organisation admins manage them from the Console.

Capability	What runs where	Stage	The point
Self-hosted sandboxes	Tool execution in your environment or a managed provider; the agent loop stays with Anthropic	Public beta	Files and repositories don't leave your perimeter
MCP tunnels	Private MCP servers reached over one outbound connection; no inbound rules, encrypted end to end	Research preview	Internal databases and APIs become agent tools without public exposure

For a regulated team, this is the difference between an interesting demo and something you can actually deploy.

What changed with the doubled Claude Code rate limits?

Anthropic doubled Claude Code's five-hour rate limits for Pro, Max, Team, and seat-based Enterprise plans, removed the peak-hours reduction for Pro and Max, and raised API limits for Claude Opus; the free plan was unchanged. Reporting put the Tier 1 API increase at roughly fifteen times the maximum input tokens per minute and nine times the output. For agent builders, the headroom is the story: orchestrated, graded, long-running agents burn far more tokens than a chat session, and the previous limits made serious server-side agents hit a wall.

When should you reach for each feature?

Match the feature to the bottleneck, not to the announcement. Outcomes when quality is the problem and you can describe what good looks like. Multiagent Orchestration when scope is the problem and the work truly parallelises. Dreaming when repetition is the problem and the agent keeps relearning. Webhooks when latency-tolerance is the problem and nobody should be watching the run. Sandboxes and tunnels when trust boundaries are the problem and data cannot leave.

Most production agents end up using two or three together — an orchestrated job whose specialists are graded by Outcomes, running in your own sandbox, pinging a webhook when done. Wiring those together against systems you already run, and writing rubrics a grader can actually act on, is the work a launch leaves to you.

If you are turning these capabilities into something your team actually runs, these AI Heroes pieces are the natural companion set:

Claude skills: why your best prompts keep failing - the architecture layer that turns judgment into reusable agent execution.
AI agent workflow automation - how recurring work becomes an agent workflow instead of an ad hoc chat.
The long-running agent harness on the Claude Agent SDK - the evaluator-gated loop behind agents you can leave running.
AI institutional knowledge - why durable memory matters once more than one person relies on an agent.
Inside Anthropic's finance team - what a managed, human-reviewed agent workflow looks like in practice.
Where to start with Claude Code in a large repo - the decision layer that runs before the build.

The agent built for this

No. 13

Est. 2026

Richard

Forward Deployment

Richard

A forward-deployed AI agent that gets your software live and adopted inside every customer.

Meet Richard

Frequently Asked Questions

Claude Anthropic Claude Managed Agents AI Agents Multiagent Orchestration Agent Evaluation MCP

Marco Lobo

Founder, AI Heroes

I build AI companies and the systems inside them. At AI Heroes, we give businesses the functional capacity to grow without the headcount growth normally demands — sales that follows up, marketing that runs, content that ships, ops that handles itself. We audit where you're leaving growth on the table, build the team that captures it, and hand it over completely.

I've built at scale before. Leading product and GTM at SlideSpeak AI (1M+ monthly users, profitable, bootstrapped). CPO at Disperse — the AI construction platform that went from 3 to 200+ people on $35M raised. I also co-founded LOBOMAR, a luxury fashion label featured in Elle, Cosmopolitan, and the LA Times, with shows at the London Design Museum, Wereldmuseum, and Amsterdam Fashion Week.

LinkedIn X / Twitter

Claude Launch AnalysisClaude TagAnthropic

What Is Claude Tag? How Anthropic's Slack AI Teammate Works (2026)

Anthropic launched Claude Tag on 23 June 2026: a way to work with Claude inside Slack as a shared, always-on teammate. Tag @Claude and it plans a task, uses the tools you grant it, and replies in-thread. It is multiplayer, learns from the channel, can take initiative, and works asynchronously over hours or days. It runs on Opus 4.8, is in beta for Enterprise and Team, and replaces the old Claude in Slack app.

Marco Lobo·23 Jun 2026·9 min read

AI Heroes editorial quote card tagged Anthropic · Best Practices. A hand-drawn illustration shows one hand passing a folder to another. The quote reads: “chat is for when the output is a thought in your head, claude cowork is for when the output is something you hand to someone else,” attributed to Austin Lau, Growth Marketing Lead, Anthropic.

AI GuidesClaude CoworkAnthropic

How to Get Started with Claude Cowork: A Decision Framework for Knowledge Workers (2026)

Claude Cowork is where you delegate a whole task instead of asking a question — point it at your files and apps, describe the outcome, get finished work. The hard part isn't the prompt, it's knowing which tasks to hand it. Here's a 5-signal fit test, the three shapes a Cowork task can take, and how to get your first deliverable in ten minutes.

Marco Lobo·4 Jun 2026·9 min read

AI EngineeringAgent HarnessHarness Debt

Harness Debt: Your AI Agent Scaffolding Is Quietly Fighting the Model (2026)

Your AI agent is probably worse than the model inside it — and the gap is your own scaffolding. An experimental harness scored over 2x Anthropic's standard one on the same model. The fix isn't a bigger framework; it's deleting the assumptions that went stale the day Claude Opus 4.6 shipped.

Marco Lobo·23 May 2026·11 min read

How Claude Managed Agents Actually Work: Dreaming, Outcomes, Multiagent Orchestration, and Webhooks (2026)

What are Claude Managed Agents?

How does Dreaming work, and when should you use it?

How do Outcomes work, and why does a separate grader matter?

How does Multiagent Orchestration work?

What do Webhooks add?

Where do self-hosted sandboxes and MCP tunnels fit?

What changed with the doubled Claude Code rate limits?

When should you reach for each feature?

Related reading

The agent built for this

Richard

Frequently Asked Questions

Related Articles

What Is Claude Tag? How Anthropic's Slack AI Teammate Works (2026)

How to Get Started with Claude Cowork: A Decision Framework for Knowledge Workers (2026)

Harness Debt: Your AI Agent Scaffolding Is Quietly Fighting the Model (2026)