AI Engineering

8 articles

Editorial pen-and-watercolor scene of an AI-native engineering pipeline: a fast stream of AI-generated pull requests flowing from a Claude-assisted source into a single human review-and-security checkpoint that has become the bottleneck, with small Claude and GitHub wordmarks used editorially
AI EngineeringAI-Native EngineeringEngineering Leadership

How to Run an AI-Native Engineering Org in 2026

Agentic coding doesn't remove the engineering bottleneck — it moves it from writing code to verifying it. Here's the 2026 operating model for an AI-native engineering organization: the processes to rewrite, how code review changes, and the metrics that prove it's working.

Marco Lobo
Marco Lobo·Jun 3, 2026·11 min read
Handdrawn editorial system diagram contrasting a fragile screenshot click loop with Microsoft Webwright's terminal-native browser automation workflow: Microsoft Research and Webwright logo card, Playwright script, disposable browser sessions, logs, screenshots, and reusable tool library on cream paper
AI EngineeringWebwrightMicrosoft Research

What Are Terminal-Native Web Agents? Microsoft Webwright and the End of Click-by-Click Computer Use (2026)

The next reliable web agent will not just click better. Microsoft Webwright points at the real shift: terminal-native agents that turn repeated browser work into Playwright code, logs, screenshots, fresh reruns, and reusable tools.

Marco Lobo
Marco Lobo·May 27, 2026·13 min read
Editorial pen-and-watercolour branching decision tree inside a large codebase, with repo-shape paths for monorepo, legacy, and multi-repo work leading to Claude Code mechanics like CLAUDE.md scoping, subagents, agentic search, and /compact
AI EngineeringClaude CodeLarge Codebases

Where to Start With Claude Code in a Large Repo: A Decision Tree (2026)

You do not start a large Claude Code rollout by configuring everything. You start with the one mechanic your repo shape and your actual pain point demand — and ignore the rest until you hit them. This is the decision layer that runs before the build.

Marco Lobo
Marco Lobo·May 24, 2026·11 min read
Handdrawn editorial illustration: a capable Claude agent (Anthropic wordmark + symbol legible) straining against heavy scaffolding poles, ropes and bolted-on guard rails labelled "orchestration", "tool wrappers", "fat system prompt"; a lighter, cleaner frame beside it labelled "boundaries that matter"; calm cream background, pen-and-watercolour style
AI EngineeringAgent HarnessHarness Debt

Harness Debt: Your AI Agent Scaffolding Is Quietly Fighting the Model (2026)

Your AI agent is probably worse than the model inside it — and the gap is your own scaffolding. An experimental harness scored over 2x Anthropic's standard one on the same model. The fix isn't a bigger framework; it's deleting the assumptions that went stale the day Claude Opus 4.6 shipped.

Marco Lobo
Marco Lobo·May 23, 2026·11 min read
Handdrawn editorial diagram of the Generator-Evaluator harness pattern — a three-agent triangle with a Planner agent expanding a 1-4 sentence prompt into a product spec, a Generator agent building feature-by-feature using a React + Vite + FastAPI + SQLite stack, and an Evaluator agent using Playwright MCP to navigate the live app and grade against design quality, originality, craft, and functionality criteria; file-based handoff arrows between the three agents; by Anthropic Labs wordmark top-right, Claude Agent SDK badge bottom-right
AI EngineeringClaude Agent SDKAnthropic

Harness Design for Long-Running AI Applications: Inside Anthropic's Generator-Evaluator Pattern (Claude Agent SDK, 2026)

On 24 March 2026 Anthropic Labs engineer Prithvi Rajasekaran published the most rigorous public account to date of how Anthropic designs harnesses for long-running AI applications — a GAN-inspired generator-evaluator pattern applied across two unusually different domains: frontend design (subjective, no binary verification) and full-stack coding (objective, machine-verifiable). The piece evolves the November 2025 Initializer + Coding Agent baseline into a three-agent planner + generator + evaluator architecture, with concrete cost-and-duration data ($200 / 6h on a retro game maker test, then $124 / 4h on a more ambitious DAW after the Opus 4.6 simplification pass). Inside the pattern, the two failure modes it fixes (context anxiety + self-evaluation bias), how it compares to LangGraph / AutoGen / OpenAI Assistants v2 / Devin, when it doesn't fit, and the canonical principle every team operating a harness should adopt: stress-test every component against the current model.

Marco Lobo
Marco Lobo·May 22, 2026·13 min read
Handdrawn editorial spread showing Claude Code generating a single HTML file with side-by-side option grid, embedded SVG diagram, and a slider control, signed with the Claude wordmark and Anthropic symbol
AI EngineeringClaude CodeHTML

Claude Code + HTML: The 2026 Implementation Guide to the Right Output Medium

Anthropic's own engineers have moved Claude Code outputs to HTML for almost everything. The implementation question is when HTML wins, when it doesn't, and how the handoff from Claude Design to Claude Code should actually look.

Marco Lobo
Marco Lobo·May 20, 2026·11 min read
Handdrawn city-scale software codebase with agent figures traversing modules, worktrees, hooks, and review gates
AI EngineeringClaude CodeLarge Codebases

Claude Code in Large Codebases: The 2026 Implementation Guide

Claude Code does not win large codebases by swallowing the repo. It wins when you build a navigation and governance layer around it.

Marco Lobo
Marco Lobo·May 19, 2026·11 min read
Photograph of a UK Sun-style tabloid newspaper front page lying on a desk — masthead THE SUN, screaming red-and-black headline TAN vs CLAW with deck Silicon Valley benchmark BLOODBATH, split press-photo of Garry Tan and the OpenClaw lobster mascot facing off, yellow EXCLUSIVE sticker, bottom strip of unrelated tabloid teasers, real desk context with bacon sandwich and tea ring
AI EngineeringAgent MemoryRetrieval

We Benchmarked Garry Tan's gbrain Against Our Own Agent Memory on 150 Real Questions (May 2026)

A 352-file, 150-question apples-to-apples retrieval benchmark between gbrain and our existing OpenClaw qmd setup. gbrain wins 8.3x more often on hard, cross-source, and discrimination questions — but the headline is messier than the marketing.

Marco Lobo
Marco Lobo·May 5, 2026·17 min read

Stay updated

Get new articles on AI implementation for business delivered to your inbox. No spam, no fluff.