Run Claude agents for hours — one agent, or a whole fleet — without any of them silently dying.
A two-pulse control system for Claude Code with an objective evaluator agent, recovery steering, computer use, Opus-orchestrates-Codex delegation, fleet-mode parallelism, and Discord as the operator UX. Built on Anthropic's published research into long-running agent harnesses.
Enter your email below to get the GitHub repository link plus a one-page quickstart cheatsheet.
TL;DR
What this is, in plain language.
AI agents that run for hours have a quiet failure mode: the loop only knows to continue when a turn ends. If a turn never ends — because something hung, a subagent froze, or the operator walked away — the loop has no way to notice the silence. You think the agent is still working. It isn't.
This harness adds a second clock outside the agent. Every 15 minutes, it reads a heartbeat the agent is supposed to be writing. If the heartbeat goes stale, it posts a stall alert and writes recovery guidance the agent will read on its next turn.
Works in two modes. Drive a single Claude agent on a single long goal. Or have one Claude Opus orchestrate a fleet of Claude and Codex subagents working in parallel on isolated worktrees — and watch every one of them with the same heartbeat, the same evaluator, and the same kill switch.
The whole thing is free, open source, Apache-2.0, and lives in one GitHub repo. It runs on top of Claude Code. It is built on Anthropic's own published research into long-running agent harnesses.
The problem
Why long-running agents fail silently.
Claude Code, like most agent loops, is event-driven. It only checks whether the work is done at turn boundaries — moments when the model finishes speaking. That works fine when turns happen every minute or two.
On a four-hour goal, the assumption breaks. If one turn never ends, the loop is dead, but from the outside it looks identical to a loop that is still thinking. You only find out the next morning that the agent stopped at 11pm and the eight hours you planned to leverage are gone.
The fix is structural: you need a watchdog that lives outside the agent and reads a heartbeat the agent writes from inside. The harness gives you that, with sensible defaults already wired in.
How it works
How does the long-running agent harness actually work?
Two clocks. The inner clock runs at the speed of the agent. The outer clock runs at the speed of the wall.
| Pulse | When it runs | What it checks | How it reacts |
|---|---|---|---|
| Inner pulse | At every turn boundary inside Claude Code. | Reads test-results.json. Checks if every criterion is true. | Blocks the turn and continues the loop if any criterion is false. Writes a fresh heartbeat timestamp. |
| Outer pulse | Every 15 minutes on a real clock, outside the agent. | Reads the heartbeat timestamp. | If the heartbeat is older than 20 minutes, the agent is stalled. Post a stall alert. Append recovery guidance to STEER.md so the agent reads it on the next turn. |
There is one more piece that makes the harness honest: every success criterion starts as false, and the only way it can flip to true is fresh evidence read through Claude Code's Read tool and validated by a hook. The agent cannot claim victory on vibes. This is the Default-FAIL contract.
Built on
Built on Anthropic's research, not on someone's clever idea.
The two-pulse pattern, the generator/evaluator loop, the hooks that enforce the Default-FAIL contract — all of it comes from research and reference implementations Anthropic has published. The AI Heroes harness packages them, adds a stall detector, pins the Codex executor, and ships it as one installable plugin.
Anthropic Engineering • November 2025
Effective harnesses for long-running agents
The original framing of the two-pulse pattern, the heartbeat, and why event-driven loops cannot catch their own silent failures.
Read sourceAnthropic Engineering • March 2026
Harness design for long-running application development
The generator/evaluator loop, scope policies for honest termination, and how to design a harness that knows when to stop.
Read sourceAnthropic on GitHub • Reference implementation
anthropics/cwc-long-running-agents
Reference hooks — track-read, verify-gate, kill-switch, steer, commit-on-stop — and the evaluator agent. The harness extends these directly.
Read sourceOn top of those primitives, the AI Heroes harness adds seven things the public version doesn't have — from the outer 15-minute stall detector all the way to full fleet orchestration. Here is the list.
Further reading on the AI Heroes blog: Harness Design for Long-Running AI Applications — Inside Anthropic's Generator-Evaluator Pattern.
What we add
What AI Heroes adds on top of Anthropic's primitives.
Anthropic published the foundations. We spent months turning them into something you can actually leave running — one agent or a whole fleet — and trust. Seven additions do the heavy lifting.
A 15-minute heartbeat watchdog
A second process wakes every 15 minutes and checks the agent is still alive. If it has gone silently dead, you find out in minutes — not the next morning.
An objective reviewer agent
A separate evaluator agent inspects the actual work and refuses to call it done until it genuinely meets the goal. The working agent cannot mark its own homework.
A recovery step for long sessions
Over hours, agents drift and forget the point. A recovery step re-grounds the agent in the original goal whenever it resumes, so it doesn't lose the plot.
Computer use, not just browser use
Beyond the standard browser skill, the agent can drive the computer directly to test its own work and close the loop — clicking through what it built to confirm the goal is actually met.
Claude Opus runs the show, Codex executes
A Claude Opus orchestrator owns the judgement — planning, review, accountability — and delegates the heavy execution to Codex agents. The right model for each job.
Fleet mode: one orchestrator, many agents
Run a single agent on one goal, or have one Opus drive a whole fleet of Claude and Codex subagents in parallel — each in its own isolated worktree, all watched by the same heartbeat.
Discord as the operator UX
Talk to your agents like teammates from Discord — start goals, steer mid-run, watch progress — instead of staring at a terminal you can't take your eyes off.
Safeguards
What stops a long-running agent from going off the rails?
Letting an agent run for hours only works if you trust it to stop. The harness ships with four safeguards on by default.
Default-FAIL contract
Every result starts false. Only fresh evidence read through the Read tool and validated by the verify-gate hook can flip it to true. The agent cannot finish by saying "done" — it has to show its work.
Anti-runaway cap
After eight consecutive turns where the goal is still not met, the next turn is allowed to proceed even if criteria are still false. This prevents an infinite loop. The cap resets when you steer.
Kill switch
Drop an AGENT_STOP file into the workspace and the harness stops the loop cleanly at the next turn. No process killing, no orphaned state.
Mid-run steering
Write into STEER.md while the agent is running. The next turn reads it, adjusts course, and resets the anti-runaway block counter. You stay in the loop without restarting it.
Best fit
What goals is this harness good for?
The harness only works when there is a real terminator — a programmatic gate that says "done" or "not done" without human judgement. If you cannot write that gate, the harness has nothing to wait for.
| Goal type | Example |
|---|---|
| Engineering with a test suite | Build three Next.js routes with Playwright coverage; all tests green. |
| Migrations gated on build and lint | Migrate to next/image everywhere; npm run build passes, zero lint warnings. |
| Content batches with an audit | Generate five GEO blog articles; each passes geo-article-audit with zero FAILs. |
| Multi-sprint product work | Ship a free tool route; Lighthouse score above 90, no console errors. |
| Parallel fleet work | Build four features at once, each in its own worktree gated on its own test suite — Opus integrates, the evaluator reviews, all four green before merge. |
Not a fit for
- Open-ended thinking exercises (no terminator).
- Single judgement calls (no loop).
- Subjective design refinement (no programmatic gate).
- One-shot research memos (no iteration).
Get the harness
Get the GitHub link.
Drop your email. We will send you the GitHub repository link, a one-page quickstart cheatsheet, and an occasional update when the harness gets meaningful upgrades.
Get the harness
Enter your email to receive the GitHub repository link and a one-page quickstart cheatsheet for the long-running agent harness.
No spam, ever. Your email is stored securely so we can send you updates about new use cases and workflows.
Install
How do you install the harness in Claude Code?
Five commands. About five minutes. Every state-changing step writes a timestamped backup so every change is reversible.
Step 1 — Clone into Claude Code plugins
cd "$HOME/.claude/plugins" && \
git clone https://github.com/mlobo2012/ai-heroes-long-running-agent-harness.git discord-long-running-harnessStep 2 — Enable for a launcher (dry-run, then --apply)
"$HOME/.claude/plugins/discord-long-running-harness/bin/enable-for-launcher.sh" --slug klaus
"$HOME/.claude/plugins/discord-long-running-harness/bin/enable-for-launcher.sh" --slug klaus --applyStep 3 — Pin your Codex executor model
cat > "$HOME/.claude/codex-current-model.env" <<'ENV'
CODEX_MODEL=gpt-5.5
ENVStep 4 — Verify the install
"$HOME/.claude/plugins/discord-long-running-harness/scripts/verify-install.sh" --scope coreStep 5 — Bootstrap a workspace and register a goal
scripts/init-workspace.sh "$HOME/path/to/workspace"
"$HOME/.claude/plugins/discord-long-running-harness/scripts/register-goal.sh" --agent klaus --channel <channel_id> --workspace "$HOME/path/to/workspace" --launcher "$HOME/.claude/channels/discord/start-klaus.sh" "Your goal description here"Paste the /goal command the register-goal script prints into your Claude Code session. The harness takes over from there.
FAQ
Frequently asked questions about the long-running agent harness
Yes. Apache-2.0, open source, hosted on GitHub. There is no paid tier and no licence to negotiate. AI Heroes makes money when teams want it set up and integrated for them, not from the code itself.
Today, yes. The harness is built around Claude Code's hooks system, the Read tool, and the turn-boundary events. The two-pulse pattern is portable in principle, but this implementation depends on Claude Code primitives.
Two clocks. The inner pulse runs at every turn boundary inside the agent and checks whether the work is done. The outer pulse runs every 15 minutes on a real wall clock and checks whether the inner pulse is still alive. If the inner pulse stops writing heartbeats, the outer pulse alerts and writes recovery guidance the agent will read next.
Yes. Beyond single-agent mode, one Claude Opus orchestrator can drive a fleet of Claude and Codex subagents working in parallel — each in its own isolated git worktree so they never collide. The same 15-minute heartbeat, the same objective evaluator agent, and the same kill switch watch every agent in the fleet at once. You run one long goal with one agent, or many lanes with a whole team, on the same control system.
A Claude Opus model runs the show — it owns the planning, the review, and the accountability for whether the goal is met. It delegates the heavy execution to Codex agents, which are fast at writing and changing code. Opus decides what good looks like and checks the result; Codex does the volume. You get senior judgement on top of high-throughput execution, instead of one model trying to do both jobs at once.
Every success criterion starts as false. The only way to flip a criterion to true is fresh evidence read through Claude Code's Read tool and validated by the verify-gate hook. This stops the agent from declaring a goal complete without showing its work. It is the single most important safeguard in the harness.
Goals with a real programmatic terminator. Engineering work with a test suite, migrations gated on build and lint, content batches with an automated audit, multi-sprint product work with a measurable success gate. If you cannot write a check that returns true or false without a human, the harness has nothing to wait for.
Yes. It extends Anthropic's primitives directly — the Effective harnesses for long-running agents post (November 2025), the Harness design for long-running application development post (March 2026), and the anthropics/cwc-long-running-agents reference repository. AI Heroes adds the outer 15-minute stall detector, a pinned Codex executor, and an optional OpenClaw supervisor on top.
Yes. The harness is free, but installing it cleanly, wiring it into your Claude Code launchers, choosing the right goals, and running a first end-to-end overnight session is work most teams would rather not do themselves. Book a call from this page and we will scope it.
Drop an AGENT_STOP file into the workspace. The harness stops at the next turn boundary, cleanly. No process killing, no orphaned state. Every other operator control — STEER.md for mid-run steering, the heartbeat audit trail, the session ledger — is documented in the repository README.
Want it set up properly the first time?
The harness is free. Configuring hooks, choosing goals that match the Default-FAIL contract, and supervising a first overnight run is not. AI Heroes can do it with you.
Audit your AI search visibility while we're at it. Visit Schmitdy.