What is the evidence gate?

A hook, hooks/evidence-gate.sh, that blocks any score, verdict, or decision that does not cite a real evidence record. It is how the system kills HiPPO: nothing is prioritised because a leader asked, only because a metric, ticket cluster, or transcript backs it.

What are the adversarial verifiers?

Four agents placed where a confident mistake is most expensive: Evidence Adversary, Pre-Mortem Red Team, Victory-Bias Auditor, and Consensus Checker. They argue against the confident call, name the cheapest test to kill a risk, and block success claims that missed the target.

What do I need to connect for live operation?

Your own systems through their MCP servers: Linear or Jira for the roadmap, Notion for knowledge, Amplitude or Pendo for metrics, Slack for broadcasts, and your research and support sources for signal. Without those connections the enforcement spine and demo cycle still run, but live writes and always-on pulls cannot know your real product world.

Claude Code or Claude Cowork?

Both ship the full enforcement spine. Use Claude Code if you want the marketplace command and the repo visible. Use Claude Cowork if you want the visual plugin upload path. The install section on this page covers both.

Yes. The plugin is free and open source on GitHub. You only pay for the third-party tools you choose to connect, such as your existing Linear, Notion, or Amplitude accounts.

Built on Anthropic's product-management plugin

Anthropic's plugin advises. Autonomous PM operates.

Q: Is it safe to let it run on its own?

That is what the human-escalation boundary is for. lib/escalation.py classifies each action by reversibility, blast radius, confidence, and agent disagreement. You start read-only, widen the autonomy scope as you build trust, and the gates hold the line.

A 17-agent product team that senses, prioritises, de-risks, ships, and learns on your real systems, Linear, Notion, Slack, and Amplitude, with deterministic gates that decide what it can do on its own and what needs you.

Built on Anthropic's official product-management plugin. It keeps every skill you already know (write-spec, roadmap-update, stakeholder-update, synthesize-research, competitive-brief, metrics-review) and adds the operating layer: evidence gates, adversarial verifiers, drift detection, and a closed launch loop. See Anthropic's product-management plugin

Install the plugin See it operate

Free and open sourceClaude Code and Claude Cowork. No license, no seat fee.

linear · backlog

Backlog, reprioritised from this week's signal

evidence-gate: on

InitiativeReachImpactConfEffortRICEEvidenceGate

SSO for Enterprise8 / 10High90%L4123 sales calls + 11 deals blockedPass

Activation checklist v29 / 10High85%M358amplitude://funnel/activation -38% step 3Pass

AI summary in inbox7 / 10Med80%M291142 support tickets clusteredPass

Bulk CSV export5 / 10Low45%S120amplitude://retention flatEscalatereversible but low confidence

Redesign settings page4 / 10Low20%Lblocked(none), requested by CEOBlockno cited evidence, HiPPO rejected

The shift

Stop asking an assistant for advice. Start supervising an operating loop.

Same product moments. The difference is whether the work ends as a recommendation you still have to action, or as a gated action the system performs and records.

Anthropic's plugin advises

It returns a clear answer. You do the rest.

Drafts a reprioritised backlog and hands you the list.
Tells you which assumption looks weakest.
Writes a stakeholder update you copy into Slack.
Suggests you should probably check adoption later.

Autonomous PM operates

It performs the action, behind a gate, and leaves an evidence trail.

GatedMoves the Linear cards itself when the change is reversible and cited.
GatedOpens the investigation ticket and assigns the cheapest killer test.
GatedPosts the update to the right channel with the evidence and the dissent attached.
GatedPre-registers the target now and pulls the actual at the decision date.

When an action is irreversible, high blast radius, or thin on evidence, it does not act. It stops and asks you.

Install

Two ways in. Same operating loop.

Install in Claude Code from the marketplace, or upload the Claude Cowork plugin. Both ship the full enforcement spine. You connect your own systems when you are ready to switch on live operation.

Install from the marketplace

Two commands in your terminal. The plugin, its agents, the hooks, and the enforcement spine install together.

claude plugin marketplace add mlobo2012/autonomous-pm-plugin
claude plugin install autonomous-pm

For local development, point Claude Code at the plugin directory with claude --plugin-dir /path/to/autonomous-pm-plugin.

GitHub repo Read the README

Upload the Cowork plugin

Get the Claude Cowork plugin

Enter your work email and the Cowork plugin zip will open. Then upload it in Claude Cowork and connect your systems.

Kein Spam, niemals. Deine E-Mail wird sicher gespeichert, damit wir dir Updates zu neuen Anwendungsfällen und Workflows senden können.

Download the plugin zip. Enter your work email and the Cowork plugin zip opens.
Upload and install in Claude Cowork. Open Cowork, go to Customize, then create and upload plugins, upload the zip, and install Autonomous PM.
Allow local MCP servers. If Cowork prompts for local MCP server access, allow the evidence hooks and the connector bridges.
Connect your systems. Connect Linear or Jira, Notion, Amplitude or Pendo, and Slack with their own logins so the loop can read and act on your real product world.
Run the first operating cycle. Ask it to reprioritise the backlog from this week's signal, or to run the ship gate on a feature, and watch the gates decide what is safe to do alone.

The four surfaces

Four screens a product manager recognises on sight.

Rendered from the plugin's own logic, not a stock dashboard. Every score, verdict, and assumption carries a gate light: green Pass with cited evidence, amber Escalate when a human is needed, red Block when there is no evidence or the action is irreversible.

PassEscalateBlock

Pod 1, Evidence-Gated Prioritisation

Why a PM recognises this: it is your RICE backlog, except every row has to cite evidence or it gets blocked.

linear · backlog

Backlog, reprioritised from this week's signal

evidence-gate: on

InitiativeReachImpactConfEffortRICEEvidenceGate

SSO for Enterprise8 / 10High90%L4123 sales calls + 11 deals blockedPass

Activation checklist v29 / 10High85%M358amplitude://funnel/activation -38% step 3Pass

AI summary in inbox7 / 10Med80%M291142 support tickets clusteredPass

Bulk CSV export5 / 10Low45%S120amplitude://retention flatEscalatereversible but low confidence

Redesign settings page4 / 10Low20%Lblocked(none), requested by CEOBlockno cited evidence, HiPPO rejected

Pod 3, Pre-Mortem Ship-Gate

Why a PM recognises this: it is the spec review, except the red team argues the feature already failed before you build it.

spec · ship-gate

Ship gate, SSO for Enterprise

pre-mortem: armed

Pre-registered target

At least 30% of enterprise accounts enable SSO within 30 days of GA.

Load-bearing assumptions

weakEnterprise buyers will self-serve SSO once it exists.2 of 3 reference calls unsure
weakSAML alone clears the security review.SCIM raised in 2 calls
strongBlocked deals reactivate on SSO availability.11 deals tagged sso-blocker

Pre-Mortem Red Team

Strongest case it flops: buyers want SCIM provisioning, not just SSO. 2 of 3 reference calls raised it.

Cheapest killer test: 5 customer-development calls before any build.

HoldSHIP-GATE: hold until killer test passes.

Pod 2, Signal and Drift

Why a PM recognises this: it is the roadmap review, except it pushes the contradiction to you instead of waiting for the quarter.

drift · sentinel

Drift scan, roadmap assumptions vs last 14 days of signal

two-sided rule: enforced

An assumption is only marked contradicted when both the documented claim and the opposing signal are present.

AssumptionDocumented claimOpposing signalStatus

Mobile is a secondary surfaceStrategy doc Q2: mobile is read-only, desktop-first.61% of new activations started on mobile (amplitude).Contradicted

Enterprise is the growth motionRoadmap thesis: land enterprise, expand seats.Enterprise pipeline up 22%, no opposing signal.Holds

Onboarding is the activation leverPRD: guided setup drives week-1 activation.Setup completion steady at 74%, no divergence.Holds

Pod 4, Closed-Loop

Why a PM recognises this: it is the launch retro, except a separate auditor blocks any success claim that missed its target.

launch · adjudicator

Launch loop, did what we shipped get adopted?

victory-bias auditor: on

In-app onboarding tour

Kill

Predicted: +8% week-1 activation
Actual at decision date: +1.2%

below pre-registered target. Victory-Bias Auditor blocked the success write-up.

Saved views for power users

Double down

Predicted: +5% weekly retention
Actual at decision date: +6.4%

cleared the target. Expansion bet opened with evidence.

Inline comments

Iterate

Predicted: +10% collaboration sessions
Actual at decision date: +4.1%

partial signal. Adjudicator queued one more cycle.

The control model

It earns the right to act. Or it stops and asks you.

Three deterministic gates sit under every action. They are hooks and libraries, not a tone of voice. They decide what is safe to run alone.

Pass

Evidence gate

No score, verdict, or decision passes without a cited evidence record. hooks/evidence-gate.sh blocks the claim if the citation is missing.

Audit

Victory-Bias Auditor

Success is impossible below the pre-registered target. lib/launch.py refuses to call a launch validated when the actual missed the number.

Stop

Human-escalation boundary

lib/escalation.py classifies each action by reversibility, blast radius, confidence, and agent disagreement, then routes the rare strategic call to a human with a brief.

The team

17 agents across 5 pods.

Each pod owns one part of the loop. The four marked agents are adversarial verifiers, placed where a confident mistake is most expensive.

Pod 1, Evidence-Gated Prioritisation

Evidence LibrarianOwns the Evidence Store, resolves every claim to cited records.
Prioritization AnalystProposes RICE and ICE scores, each component citation-backed.
⚔Evidence AdversaryMust break the top items or certify no disconfirming evidence.

Pod 2, Signal and Drift

Signal IngestorScheduled pulls, clusters raw signal into cited themes.
Drift SentinelDiffs live signal against standing strategy, flags divergence.
Synthesis EditorWrites the digest and drift log, proposes doc edits, never commits them.

Pod 3, Pre-Mortem Ship-Gate

Spec AuthorDrafts the PRD using the inherited write-spec skill.
Assumption MapperExtracts load-bearing assumptions, tags evidence strength.
⚔Pre-Mortem Red TeamArgues the feature already failed, names the cheapest killer test.
Discovery RunnerDesigns and launches the assumption test.

Pod 4, Closed-Loop

Launch TrackerPre-registers predicted metric, target, and decision date at ship.
Adoption AuditorPulls the actual outcome at the decision date, compares to target.
Outcome AdjudicatorRenders iterate, hold, double down, or kill.
⚔Victory-Bias AuditorBlocks success verdicts that did not clear the pre-registered target.

Pod 5, Alignment and Escalation

Alignment BroadcasterEmits audience-tailored decision updates with evidence and dissent.
Escalation RouterRoutes the rare strategic call to the right human with a brief.
⚔Consensus CheckerDetects agent and commitment conflicts, blocks until reconciled.

Adversarial verifier

Built on Anthropic

Every skill you know, plus the right to act.

Autonomous PM inherits the official product-management plugin. Out of the box each skill gives you advice. With the operating layer, each one performs the gated action and leaves an evidence trail.

See Anthropic's product-management plugin

SkillOut of the box (advice)With Autonomous PM (operates, gated)

write-specTurns an idea into a PRD with requirements, scope, and success metrics.Maps the load-bearing assumptions, red-teams the spec, and holds the ship gate until the killer test passes.

roadmap-updateHelps refresh or reprioritise a roadmap in a familiar format.Reprioritises from this week's cited signal and moves the Linear cards when the change is reversible.

stakeholder-updateDrafts a status update for executives or partners.Posts the update to the right Slack channel with the evidence and the dissent attached.

synthesize-researchTurns interviews and tickets into themes and opportunities.Keeps signal flowing always-on and clusters it into cited evidence the gates can use.

competitive-briefCreates a competitor brief with positioning and implications.Feeds the brief into drift detection so a competitor move can flag a contradicted assumption.

metrics-reviewReviews product metrics, trends, and follow-up actions.Pre-registers the prediction, pulls the actual at the decision date, and adjudicates iterate, hold, double down, or kill.

brainstormStress-tests ideas and explores the problem space.Routes the best options into the evidence gate so nothing is prioritised on a hunch.

What you are actually getting

What this is, honestly.

This is not zero-setup full autonomy. It is calibrated trust, which is the point.

Most of the autonomous PM category is agent washing. Height, the most hyped pure-play, shut down in 2025. The point was never autonomy for its own sake. It is the discipline underneath: cited evidence, deterministic gates, and a closed loop you can audit. Autonomous PM is the disciplined version, built on Anthropic's plugin rather than a replacement for your stack.

01
You install it
In Claude Code from the marketplace, or in Claude Cowork from the plugin zip. The enforcement spine and the demo cycle run on day one.
02
You connect your own systems
Linear or Jira, Notion, Amplitude or Pendo, Slack, and your research and support sources, each with its own login. Until you do, it cannot know your real product world.
03
You choose the autonomy scope
You decide how far it can act on its own. Start read-only, widen the scope as you trust it.
04
The gates decide what is safe
Reversible and well-evidenced actions can run alone. Irreversible, high blast radius, or thinly evidenced ones stop and ask you.

Fit

Who this is for, and who it is not.

Best for

PMs and founders whose work is spread across Linear or Jira, Notion, Slack, and Amplitude.
Teams that want priorities, specs, and launch calls grounded in evidence, not opinion.
Product leaders who want to supervise a loop and be pinged only for the rare strategic call.
Operators comfortable connecting their own MCP servers to switch on live operation.

Not for

Anyone expecting full autonomy out of the box with no setup and no connected systems.
Teams that want the agents to approve irreversible or high blast radius actions with no human.
Workflows with no product signal to cite, where every call is a leadership preference.

FAQ

What should a product team know before installing?

Anthropic's plugin gives you the PM skills and returns advice. Autonomous PM keeps those skills and adds an operating layer: a 17-agent team, an evidence substrate, adversarial verifiers, drift detection, a closed launch loop, and deterministic gates that decide when the system may act and when a human must approve.

It can act, behind a gate. When an action is reversible and cited, it can move Linear cards, open a ticket, or post a Slack update on its own. When an action is irreversible, high blast radius, or thin on evidence, it stops and asks you.

Sources and research

The evidence behind the enforcement spine.

The gates are not opinion. They are built on the product research that shows why most features go unused and why confident calls go wrong.

Pendo 2019 Feature Adoption Report

80% of features are rarely or never used. The build trap, measured.

View source

Standish Group CHAOS

64% of software features are rarely to never used.

View source

Kohavi et al. Online Experimentation at Microsoft

About one third of experiments improve the metric they target.

View source

HiPPO (Kaushik and Kohavi)

The highest-paid person's opinion is what evidence-gated priorities exist to replace.

View source

Kohavi, Microsoft online experimentation (KDD 2015 keynote)

Two of three product ideas fail to move the metric they target.

View source

Ronny Kohavi, 1,000 Experiments Club

Netflix reports about 90% of ideas fail to beat control. Google reports about 96%.

View source

Harvard study of 22,000 experiments (O'Reilly)

Only about 1 in 10 experiments produced a statistically significant win.

View source

Cemri et al. Why Do Multi-Agent LLM Systems Fail?

The MAST taxonomy reports 41% to 86.7% failure rates. It is why the gates are deterministic, not vibes. arXiv:2503.13657.

View source

Teresa Torres, Continuous Discovery Habits

Opportunity Solution Trees, the discovery discipline behind the ship gate.

View source

Melissa Perri, Escaping the Build Trap

The feature factory failure mode the closed loop is built to break.

View source

Custom build

Want this wired to your real Linear, Notion, Amplitude, and Slack?

The free plugin is the clean starting point. If your product org runs across Linear or Jira, Notion, Amplitude or Pendo, Slack, support tickets, sales calls, and rules that live in people's heads, Marco will build the operating loop around the way your team actually works.

Talk to Marco View on GitHub

01
Map the product stack
Linear or Jira, Notion, Amplitude or Pendo, Slack, support, and the places product work really happens.
02
Set the gates
Evidence rules, escalation thresholds, the autonomy scope you trust, and the product judgement the agents must respect.
03
Run it with your team
Reprioritisation, ship gates, drift scans, and launch loops inside your workflow, with humans approving the rare strategic call.

Anthropic's plugin advises. Autonomous PM operates.

Backlog, reprioritised from this week's signal

Stop asking an assistant for advice. Start supervising an operating loop.

Two ways in. Same operating loop.

Install from the marketplace

Upload the Cowork plugin

Get the Claude Cowork plugin

Four screens a product manager recognises on sight.

Backlog, reprioritised from this week's signal

Ship gate, SSO for Enterprise

Drift scan, roadmap assumptions vs last 14 days of signal

Launch loop, did what we shipped get adopted?

It earns the right to act. Or it stops and asks you.

Evidence gate

Victory-Bias Auditor

Human-escalation boundary

17 agents across 5 pods.

Pod 1, Evidence-Gated Prioritisation

Pod 2, Signal and Drift

Pod 3, Pre-Mortem Ship-Gate

Pod 4, Closed-Loop

Pod 5, Alignment and Escalation

Every skill you know, plus the right to act.

What this is, honestly.

You install it

You connect your own systems

You choose the autonomy scope

The gates decide what is safe

Who this is for, and who it is not.

Best for

Not for

What should a product team know before installing?

The evidence behind the enforcement spine.

Want this wired to your real Linear, Notion, Amplitude, and Slack?

Map the product stack

Set the gates

Run it with your team

Install the loop. Connect your systems. Supervise, do not operate.