AI Engineering

8 articles

Editorial pen-and-watercolour scene of an AI-native engineering pipeline: a fast stream of AI-generated pull requests flowing from a Claude-assisted source into a single human review-and-security checkpoint that has become the bottleneck, with small Claude and GitHub wordmarks used editorially

AI EngineeringAI-Native EngineeringEngineering Leadership

How to Run an AI-Native Engineering Org in 2026

Agentic coding doesn't remove the engineering bottleneck — it moves it from writing code to verifying it. Here's the 2026 operating model for an AI-native engineering org: the processes to rewrite, how code review changes, and the metrics that prove it's working.

Marco Lobo·3 Jun 2026·11 min read

AI EngineeringWebwrightMicrosoft Research

What Are Terminal-Native Web Agents? Microsoft Webwright and the End of Click-by-Click Computer Use (2026)

The next reliable web agent will not just click better. Microsoft Webwright points at the real shift: terminal-native agents that turn repeated browser work into Playwright code, logs, screenshots, fresh reruns, and reusable tools.

Marco Lobo·27 May 2026·13 min read

Editorial pen-and-watercolour branching decision tree inside a large codebase, with repo-shape paths for monorepo, legacy, and multi-repo work leading to Claude Code mechanics like CLAUDE.md scoping, subagents, agentic search, and /compact

AI EngineeringClaude CodeLarge Codebases

Where to Start With Claude Code in a Large Repo: A Decision Tree (2026)

You do not start a large Claude Code rollout by configuring everything. You start with the one mechanic your repo shape and your actual pain point demand — and ignore the rest until you hit them. This is the decision layer that runs before the build.

Marco Lobo·24 May 2026·11 min read

AI EngineeringAgent HarnessHarness Debt

Harness Debt: Your AI Agent Scaffolding Is Quietly Fighting the Model (2026)

Your AI agent is probably worse than the model inside it — and the gap is your own scaffolding. An experimental harness scored over 2x Anthropic's standard one on the same model. The fix isn't a bigger framework; it's deleting the assumptions that went stale the day Claude Opus 4.6 shipped.

Marco Lobo·23 May 2026·11 min read

AI EngineeringClaude Agent SDKAnthropic

Harness Design for Long-Running AI Applications: Inside Anthropic's Generator-Evaluator Pattern (Claude Agent SDK, 2026)

On 24 March 2026 Anthropic Labs engineer Prithvi Rajasekaran published the most rigorous public account to date of how Anthropic designs harnesses for long-running AI applications — a GAN-inspired generator-evaluator pattern applied across two unusually different domains: frontend design (subjective, no binary verification) and full-stack coding (objective, machine-verifiable). The piece evolves the November 2025 Initializer + Coding Agent baseline into a three-agent planner + generator + evaluator architecture, with concrete cost-and-duration data ($200 / 6h on a retro game maker test, then $124 / 4h on a more ambitious DAW after the Opus 4.6 simplification pass). Inside the pattern, the two failure modes it fixes (context anxiety + self-evaluation bias), how it compares to LangGraph / AutoGen / OpenAI Assistants v2 / Devin, when it doesn't fit, and the canonical principle every team operating a harness should adopt: stress-test every component against the current model.

Marco Lobo·22 May 2026·13 min read

AI EngineeringClaude CodeHTML

Claude Code + HTML: The 2026 Implementation Guide to the Right Output Medium

Anthropic's own engineers have moved Claude Code outputs to HTML for almost everything. The implementation question is when HTML wins, when it doesn't, and how the handoff from Claude Design to Claude Code should actually look.

Marco Lobo·20 May 2026·11 min read

Handdrawn city-scale software codebase with agent figures traversing modules, worktrees, hooks, and review gates

AI EngineeringClaude CodeLarge Codebases

Claude Code in Large Codebases: The 2026 Implementation Guide

Claude Code does not win large codebases by swallowing the repo. It wins when you build a navigation and governance layer around it.

Marco Lobo·19 May 2026·11 min read

Photograph of a UK Sun-style tabloid newspaper front page lying on a desk — masthead THE SUN, screaming red-and-black headline TAN vs CLAW with deck Silicon Valley benchmark BLOODBATH, split press-photo of Garry Tan and the OpenClaw lobster mascot facing off, yellow EXCLUSIVE sticker, bottom strip of unrelated tabloid teasers, real desk context with bacon sandwich and tea ring

AI EngineeringAgent MemoryRetrieval

We benchmarked Garry Tan's gbrain against our own agent memory on 150 real questions (May 2026)

A 352-file, 150-question apples-to-apples retrieval benchmark between gbrain and our existing OpenClaw qmd setup. gbrain wins 8.3x more often on hard, cross-source, and discrimination questions — but the headline is messier than the marketing.

Marco Lobo·5 May 2026·17 min read

Explore more topics

AI Tools(13)AI Guides(11)Claude Launch Analysis(9)AI Solutions(8)AI Search(6)AI Automation(4)Thought Leadership(3)AI for Bridal Beauty(2)Go-to-Market(1)AI Agents(1)AI in Finance(1)Enterprise AI(1)AI Sales Operations(1)Case Study(1)

Stay updated

Get new articles on AI implementation for business delivered to your inbox. No spam, no fluff.