Running five AI agents: a practical workflow that multiplies developer output
Anthropic engineer Boris Cherny revealed a simple, reproducible workflow: run multiple Claude agents in parallel, use a single-file memory, smart models, slash commands and verification loops.
Key takeaways
- Run multiple agents in parallel to turn model latency into concurrent throughput.
- Use the smartest model you can afford to cut down human steering and corrections.
- Maintain a repository file (CLAUDE.md) to capture mistakes and rules for the agent.
- Automate git ops and verification so generated code is tested and production-ready.

When the architect of a high-performance coding agent lays out his terminal setup, it isn’t style porn — it’s an operations manual. Over the past week, Boris Cherny, creator and head of Claude Code at Anthropic, posted a practical breakdown of how he structures work with LLM-based agents. The thread rapidly became a reference point for engineers who want to move from autocomplete to orchestration.
What’s notable is not complexity but discipline: a handful of repeatable rules and a few engineering artifacts convert a single developer into the throughput of a small team. Below we unpack the specific elements Cherny uses and how they combine into a coherent execution system.
The fleet-commander approach: run agents in parallel
Cherny rejects the traditional linear inner loop. Instead of writing, testing, and repeating in a single stream, he operates multiple Claudes simultaneously and treats them like coordinated workers in a real-time strategy game.
"I run 5 Claudes in parallel in my terminal. I number my tabs 1-5, and use system notifications to know when a Claude needs input."
Concretely, his setup includes:
- Five parallel terminal agents (and several browser sessions of Claude) each assigned a different lifecycle role — testing, refactoring, documentation, etc.
- System notifications (iTerm2) to alert the human operator when an agent needs direction.
- A simple switching mechanism (tab numbers and a "teleport" command) to hand off sessions between browser and local terminal.
This pattern converts wait time and model thinking time into concurrent productive work. Rather than sitting idle while a single model generates or tests, multiple agents progress different threads of the same project.
Pay compute to reduce human correction
Counterintuitively, Cherny prefers the heaviest model available: Opus 4.5. He uses it with the thinking option for all coding tasks.
"I use Opus 4.5 with thinking for everything. It's the best coding model I’ve ever used… since you have to steer it less and it's better at tool use, it is almost always faster than using a smaller model in the end."
The operational lesson is clear for engineering leaders: latency is less costly than human correction. The dominant bottleneck in model-assisted development is the time humans spend fixing mistakes. Spending extra compute to get a smarter, slower model reduces steering overhead and, in practice, accelerates delivery.
Institutional memory: one file, continuous learning
LLMs do not carry persistent, company-specific context across sessions by default. Cherny’s team solves this with an explicit, repository-checked artifact: CLAUDE.md. When an agent errs, the corrective rule is added to that file.
The mechanics are simple and powerful:
- Every observed mistake is written down as a remediation rule in CLAUDE.md.
- Reviewers who fix pull requests add the rule, turning a human correction into an agent instruction.
- Over time the repository becomes a living spec of what the agent should and should not do.
By converting mistakes into machine-readable constraints, the team reduces repeated errors and accelerates onboarding of future sessions.
Automate the bureaucracy: slash commands and subagents
To remove repetitive, low-value toil, Cherny checks slash commands and small subagents into the codebase. These primitives encapsulate common operations so a single keystroke triggers multi-step flows.
- Slash commands — project-level shortcuts like
/commit-push-prthat handle git operations, commit messaging, and opening PRs without manual typing. - Subagents — specialized personas that run after the main change: a code-simplifier to tidy architecture and a verify-app agent to run end-to-end checks before merge.
These patterns keep the human focused on design and review instead of orchestration choreography.
Verification loops: the real quality multiplier
Arguably the central capability in Cherny’s approach is giving agents the ability to verify their work. The agent is not a one-shot writer; it is also a tester.
"Claude tests every single change I land to claude.ai/code using the Claude Chrome extension. It opens a browser, tests the UI, and iterates until the code works and the UX feels good."
Cherny reports that having the model execute tests, run bash commands, and automate browser interactions improves final quality by a measurable factor — quoted as "2-3x" in the thread. The implication for teams is that verification closes the loop between generation and correctness, and makes AI-generated code production-ready rather than draft-quality.
What this workflow signals about product and GTM
The community response — from observers like Jeff Tang and Kyle McNease to Anthropic leadership — frames this as more than a neat trick. It’s an operating model: coordinate multiple capable agents, give them institutional rules, automate the grunt work, and require them to prove their changes.
Cherny’s thread also reinforces a strategic point Anthropic emphasized elsewhere: superior orchestration of models can offset massive infrastructure plays. The claim that Claude Code has reportedly reached $1 billion in ARR underscores that this isn’t purely academic; this approach has commercial traction.
What This Means For You
If you run engineering teams or ship product, apply these pragmatic steps immediately. They don’t require exotic tooling — they require discipline and a small set of repo-level artifacts.
- Run multiple agents in parallel for different lifecycle tasks. Start with 3 if 5 feels excessive; allocate one for testing and one for verification.
- Choose the smartest model you can afford for complex tasks. Expect higher compute costs but lower review time.
- Create a single repository file (e.g., CLAUDE.md) to log agent mistakes and corrective rules; add it to your PR review checklist.
- Encode repeatable operations as slash commands and small subagents checked into the repo — commit-push-pr should be an automation, not a ritual.
- Instrument verification: require agents to run test suites and UI automations before a change is merged.
These steps change where your team spends time: from manual repetition and correction to rule design, review, and higher-level problem solving.
Key Takeaways
- Parallel agents convert model latency into concurrent throughput — think fleet command, not single-threaded editing.
- Paying for a larger, slower model can reduce human correction time and speed overall delivery.
- One repository file for agent rules (CLAUDE.md) turns repeated mistakes into permanent institutional knowledge.
- Slash commands, subagents, and verification loops automate bureaucracy and ensure generated code is tested before merge.
Next move
Continue the operator thread — or move from reading to execution.
Continue reading
More Originae insights from the same operating thread.

Attack on OpenAI HQ and CEO’s Home: Operational Security Lessons
A suspect allegedly attacked OpenAI CEO Sam Altman's home and tried to breach the company's HQ; he now faces federal charges. Practical, operator-focused security steps for founders and CTOs.

When Narrative Ops Matter: What Iran’s Media Response Teaches Operators
During the early days of the Iran conflict, official US social posts landed as memes while Iranian state media saturated channels with raw battlefield footage—an operational lesson in narrative control.

Google Overhauls Enhanced Conversions in Ads for Simplicity
Google unifies enhanced conversions into a single toggle, easing setup and boosting data accuracy for advertisers.