Free and open sourceApache-2.0 • Claude Code

Run Claude agents for hours — one agent, or a whole fleet — without any of them silently dying.

A two-pulse control system for Claude Code with an objective evaluator agent, recovery steering, computer use, Opus-orchestrates-Codex delegation, fleet-mode parallelism, and Discord as the operator UX. Built on Anthropic's published research into long-running agent harnesses.

Enter your email below to get the GitHub repository link plus a one-page quickstart cheatsheet.

TL;DR

What this is, in plain language.

AI agents that run for hours have a quiet failure mode: the loop only knows to continue when a turn ends. If a turn never ends — because something hung, a subagent froze, or the operator walked away — the loop has no way to notice the silence. You think the agent is still working. It isn't.

This harness adds a second clock outside the agent. Every 15 minutes, it reads a heartbeat the agent is supposed to be writing. If the heartbeat goes stale, it posts a stall alert and writes recovery guidance the agent will read on its next turn.

Works in two modes. Drive a single Claude agent on a single long goal. Or have one Claude Opus orchestrate a fleet of Claude and Codex subagents working in parallel on isolated worktrees — and watch every one of them with the same heartbeat, the same evaluator, and the same kill switch.

The whole thing is free, open source, Apache-2.0, and lives in one GitHub repo. It runs on top of Claude Code. It is built on Anthropic's own published research into long-running agent harnesses.

The problem

Why long-running agents fail silently.

Claude Code, like most agent loops, is event-driven. It only checks whether the work is done at turn boundaries — moments when the model finishes speaking. That works fine when turns happen every minute or two.

On a four-hour goal, the assumption breaks. If one turn never ends, the loop is dead, but from the outside it looks identical to a loop that is still thinking. You only find out the next morning that the agent stopped at 11pm and the eight hours you planned to leverage are gone.

The fix is structural: you need a watchdog that lives outside the agent and reads a heartbeat the agent writes from inside. The harness gives you that, with sensible defaults already wired in.

How it works

How does the long-running agent harness actually work?

Two clocks. The inner clock runs at the speed of the agent. The outer clock runs at the speed of the wall.

PulseWhen it runsWhat it checksHow it reacts
Inner pulseAt every turn boundary inside Claude Code.Reads test-results.json. Checks if every criterion is true.Blocks the turn and continues the loop if any criterion is false. Writes a fresh heartbeat timestamp.
Outer pulseEvery 15 minutes on a real clock, outside the agent.Reads the heartbeat timestamp.If the heartbeat is older than 20 minutes, the agent is stalled. Post a stall alert. Append recovery guidance to STEER.md so the agent reads it on the next turn.

There is one more piece that makes the harness honest: every success criterion starts as false, and the only way it can flip to true is fresh evidence read through Claude Code's Read tool and validated by a hook. The agent cannot claim victory on vibes. This is the Default-FAIL contract.

Built on

Built on Anthropic's research, not on someone's clever idea.

The two-pulse pattern, the generator/evaluator loop, the hooks that enforce the Default-FAIL contract — all of it comes from research and reference implementations Anthropic has published. The AI Heroes harness packages them, adds a stall detector, pins the Codex executor, and ships it as one installable plugin.

On top of those primitives, the AI Heroes harness adds seven things the public version doesn't have — from the outer 15-minute stall detector all the way to full fleet orchestration. Here is the list.

Further reading on the AI Heroes blog: Harness Design for Long-Running AI Applications — Inside Anthropic's Generator-Evaluator Pattern.

What we add

What AI Heroes adds on top of Anthropic's primitives.

Anthropic published the foundations. We spent months turning them into something you can actually leave running — one agent or a whole fleet — and trust. Seven additions do the heavy lifting.

1

A 15-minute heartbeat watchdog

A second process wakes every 15 minutes and checks the agent is still alive. If it has gone silently dead, you find out in minutes — not the next morning.

2

An objective reviewer agent

A separate evaluator agent inspects the actual work and refuses to call it done until it genuinely meets the goal. The working agent cannot mark its own homework.

3

A recovery step for long sessions

Over hours, agents drift and forget the point. A recovery step re-grounds the agent in the original goal whenever it resumes, so it doesn't lose the plot.

4

Computer use, not just browser use

Beyond the standard browser skill, the agent can drive the computer directly to test its own work and close the loop — clicking through what it built to confirm the goal is actually met.

5

Claude Opus runs the show, Codex executes

A Claude Opus orchestrator owns the judgement — planning, review, accountability — and delegates the heavy execution to Codex agents. The right model for each job.

6

Fleet mode: one orchestrator, many agents

Run a single agent on one goal, or have one Opus drive a whole fleet of Claude and Codex subagents in parallel — each in its own isolated worktree, all watched by the same heartbeat.

7

Discord as the operator UX

Talk to your agents like teammates from Discord — start goals, steer mid-run, watch progress — instead of staring at a terminal you can't take your eyes off.

Safeguards

What stops a long-running agent from going off the rails?

Letting an agent run for hours only works if you trust it to stop. The harness ships with four safeguards on by default.

Default-FAIL contract

Every result starts false. Only fresh evidence read through the Read tool and validated by the verify-gate hook can flip it to true. The agent cannot finish by saying "done" — it has to show its work.

Anti-runaway cap

After eight consecutive turns where the goal is still not met, the next turn is allowed to proceed even if criteria are still false. This prevents an infinite loop. The cap resets when you steer.

Kill switch

Drop an AGENT_STOP file into the workspace and the harness stops the loop cleanly at the next turn. No process killing, no orphaned state.

Mid-run steering

Write into STEER.md while the agent is running. The next turn reads it, adjusts course, and resets the anti-runaway block counter. You stay in the loop without restarting it.

Best fit

What goals is this harness good for?

The harness only works when there is a real terminator — a programmatic gate that says "done" or "not done" without human judgement. If you cannot write that gate, the harness has nothing to wait for.

Goal typeExample
Engineering with a test suiteBuild three Next.js routes with Playwright coverage; all tests green.
Migrations gated on build and lintMigrate to next/image everywhere; npm run build passes, zero lint warnings.
Content batches with an auditGenerate five GEO blog articles; each passes geo-article-audit with zero FAILs.
Multi-sprint product workShip a free tool route; Lighthouse score above 90, no console errors.
Parallel fleet workBuild four features at once, each in its own worktree gated on its own test suite — Opus integrates, the evaluator reviews, all four green before merge.

Not a fit for

  • Open-ended thinking exercises (no terminator).
  • Single judgement calls (no loop).
  • Subjective design refinement (no programmatic gate).
  • One-shot research memos (no iteration).

Get the harness

Get the GitHub link.

Drop your email. We will send you the GitHub repository link, a one-page quickstart cheatsheet, and an occasional update when the harness gets meaningful upgrades.

Get the harness

Enter your email to receive the GitHub repository link and a one-page quickstart cheatsheet for the long-running agent harness.

No spam, ever. Your email is stored securely so we can send you updates about new use cases and workflows.

Install

How do you install the harness in Claude Code?

Five commands. About five minutes. Every state-changing step writes a timestamped backup so every change is reversible.

Step 1Clone into Claude Code plugins

cd "$HOME/.claude/plugins" && \
  git clone https://github.com/mlobo2012/ai-heroes-long-running-agent-harness.git discord-long-running-harness

Step 2Enable for a launcher (dry-run, then --apply)

"$HOME/.claude/plugins/discord-long-running-harness/bin/enable-for-launcher.sh" --slug klaus
"$HOME/.claude/plugins/discord-long-running-harness/bin/enable-for-launcher.sh" --slug klaus --apply

Step 3Pin your Codex executor model

cat > "$HOME/.claude/codex-current-model.env" <<'ENV'
CODEX_MODEL=gpt-5.5
ENV

Step 4Verify the install

"$HOME/.claude/plugins/discord-long-running-harness/scripts/verify-install.sh" --scope core

Step 5Bootstrap a workspace and register a goal

scripts/init-workspace.sh "$HOME/path/to/workspace"
"$HOME/.claude/plugins/discord-long-running-harness/scripts/register-goal.sh" --agent klaus --channel <channel_id> --workspace "$HOME/path/to/workspace" --launcher "$HOME/.claude/channels/discord/start-klaus.sh" "Your goal description here"

Paste the /goal command the register-goal script prints into your Claude Code session. The harness takes over from there.

FAQ

Frequently asked questions about the long-running agent harness

Yes. Apache-2.0, open source, hosted on GitHub. There is no paid tier and no licence to negotiate. AI Heroes makes money when teams want it set up and integrated for them, not from the code itself.

Today, yes. The harness is built around Claude Code's hooks system, the Read tool, and the turn-boundary events. The two-pulse pattern is portable in principle, but this implementation depends on Claude Code primitives.

Want it set up properly the first time?

The harness is free. Configuring hooks, choosing goals that match the Default-FAIL contract, and supervising a first overnight run is not. AI Heroes can do it with you.

Audit your AI search visibility while we're at it. Visit Schmitdy.