Stop Prompting, Start Designing Autonomous Agent Workflows

Anthropic’s Boris Cherny has stopped writing prompts. The creator and head of Claude Code — Anthropic’s terminal-based agentic coding tool — told interviewers in June 2026 that his job had changed: “I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.” That statement, echoed almost simultaneously by OpenAI engineer Peter Steinberger and then named and structured by Google engineer Addy Osmani in a widely shared essay, crystallized a paradigm shift that developers had been building toward for the better part of two years. The shift has a name — loop engineering — and it comes with a warning its most enthusiastic proponents have been careful to include: token costs in autonomous agent loops compound faster than almost any developer expects, and an unattended loop without a verifier is a machine that ships bugs with high confidence.

Understanding why this shift happened requires understanding something architectural about every large language model that does not show up in product announcements: LLMs are stateless. They forget everything between sessions. The agent you prompt today has no memory of the task you prompted yesterday. Every persistent piece of context — every project rule, every prior decision, every mid-task intermediate result — must live outside the model, in a file on disk, in a git repository, or in a structured memory document. That constraint is not unique to Claude Code. It is a property of every transformer-based model from every lab. Loop engineering is the systems design response to that constraint: instead of holding the context in your head and manually re-prompting the agent each turn, you build a small system that holds the context externally, decides what to prompt, dispatches the agent, and checks whether the work is done. The leverage shifts from the quality of a single prompt to the design of the system that generates and checks prompts.

Why Prompting One Turn at a Time No Longer Scales

For the first two years of AI coding agents, the standard interaction model was simple: write a prompt, add context, read the output, write the next prompt. The developer held the tool the entire time, one turn after another. That model has an upper bound. For a one-shot task — write a function, fix a specific bug — it works well enough. For anything that requires more than a few steps, adapts to feedback, or benefits from running while the engineer is doing something else, the manual turn-by-turn model collapses under its own overhead.

The New Stack’s June 2026 coverage of the loop engineering discussion described the progression in developer tooling over the preceding 18 months: prompt engineering gave way to context engineering (ensuring the right information reached the model), which gave way to harness engineering (designing the environment a single agent runs inside), which gave way to loop engineering — the harness, running on a timer, spawning helpers, and feeding itself. The critical technical distinction that separates a harness from a loop: a cron job runs a fixed script. An agent loop runs a model that reads current state and decides its own next action.

How the Agentic Loop Works: Five Stages and a Sixth That Changes Everything

The architecture underlying every agentic loop traces back to the ReAct framework, introduced by researchers at Princeton and Google in 2022. ReAct interleaved reasoning and action in a repeating cycle: the model thinks about what it needs, takes an action, observes the result, and thinks again. That cycle — demonstrated to outperform single-pass models on standard task benchmarks — became the foundation for every modern AI coding agent. The paper established the pattern that Claude Code, OpenAI Codex, and every serious agentic tool now implements.

In Claude Code, the cycle runs as follows. The agent receives a prompt with the conversation history and available tool definitions. It evaluates the current state and decides what to do next. If that decision requires a tool call — reading a file, running a test, editing code — it issues the call. The result comes back into the context, and the cycle begins again. This continues until the model determines the task is complete.

The six building blocks Addy Osmani identified in his June 7, 2026 “Loop Engineering” essay — which map almost exactly onto Claude Code’s current command set — are:

Automations: scheduled triggers that start the loop on their own — on a timer, a git event, or a continuous integration signal — so the developer is not the one pushing the button. In Claude Code, /schedule and /loop handle this. /loop runs on a specified interval; without an interval, it self-paces based on output.

Worktrees: isolation so parallel agents do not overwrite each other’s work. Two agents editing the same file produces the same collision as two engineers committing to the same lines without talking. Claude Code’s –worktree flag and isolation: worktree subagent setting each spawn a fresh git checkout that cleans itself up when the agent finishes.

Skills: saved instruction sets that freeze project knowledge so the agent does not re-learn the same context every session. Written as a folder with a SKILL.md file, skills are reusable across sessions and across team members — institutional memory that does not depend on a single developer’s prompt habits.

Connectors: Model Context Protocol-based plugins that give the loop access to real tools: GitHub, Slack, Linear, external application programming interfaces. Without connectors, the loop sees only what is in the local filesystem.

Sub-agents: the maker-checker separation. One sub-agent writes the code. A separate sub-agent runs the tests, reads the lint output, and reports what failed. The agent that wrote the code is not the one grading its own work.

Memory: external state that persists across sessions. The most common implementation is CLAUDE.md, a project-level markdown file that Claude Code reads automatically at the start of every session. When an agent makes a repeated mistake, the correct response — as Cherny described — is to have the agent write the lesson into CLAUDE.md so the correction propagates to every future session rather than staying private to one chat.

The /goal Command: What a Separate Verifier Model Does

The specific mechanism that separates a goal-conditioned loop from a simple repeating prompt is the /goal command, which Claude Code added in version 2.1.139 during the week of May 11, 2026. When a developer sets a goal — “all tests in test/auth pass and lint is clean” — Claude Code does not ask the same model that wrote the code to decide whether that condition is met. It uses a separate, faster model specifically tasked with checking the completion condition after each turn. The agent that built the work and the agent evaluating whether the work is done are different model instances.

Addy Osmani’s essay was explicit about why this matters: point a loop at something open-ended and it either produces something valuable or it quietly becomes a very expensive machine for generating bad code at high speed. The generator — the model writing code — has become extremely capable. The verifier — the part that decides whether the output meets a real standard — is where almost every poorly designed loop fails. A verifier that accepts vague success criteria does not fail loudly; it confidently ships work that the next developer has to untangle.

What Claude Code Dynamic Workflows Add to the Architecture

On May 28, 2026, Anthropic launched Dynamic Workflows in research preview alongside Claude Opus 4.8. The architectural change Dynamic Workflows makes is not adding more agents — it moves the orchestration plan out of the model’s context window entirely. In previous multi-agent patterns, subagents were dispatched by Claude turn by turn, with every intermediate result accumulating in the shared context window. That accumulation was the binding constraint on autonomous long-running tasks: the context window has a fixed size, and a migration across hundreds of thousands of lines of code could not fit.

In Dynamic Workflows, Claude writes a JavaScript orchestration script for the task at hand. A background runtime executes that script. The orchestration logic — loops, branching, agent-count decisions, verification passes — lives in script variables rather than in the model’s working memory. Each subagent gets a clean, focused context window. According to the official Claude Code documentation, a workflow run can include up to 1,000 total agents with 16 running concurrently. Salesforce has reported completing a migration that previously would have taken 231 days in 13 days using the feature.

The Token Cost of Autonomous Loops: What the Discourse Underweights

Here is what the viral loop engineering discussion largely left out: agent loops do not cost the same as prompts. Every tool call in an agentic loop adds context that is re-sent to the model on every subsequent call. By iteration 20 in a loop with file reads, the cumulative input can exceed 50,000 tokens per call. At Claude Opus 4.8’s current pricing of $5 per million input tokens, a single late-loop step costs roughly $0.25. A loop running 200 iterations on an open-ended task can cost $80 or more — compared to a well-scoped single-prompt version of the same task costing under a dollar. An analysis of 30 production engineering teams found that one developer hit $4,200 in API fees over a single weekend during an autonomous refactoring run.

The enterprise evidence is unambiguous. Uber reportedly burned through its entire 2026 AI budget for Claude Code in four months, with per-engineer API costs ranging between $500 and $2,000 monthly and usage rates reaching 95% by April. Microsoft’s Experiences and Devices division — responsible for Windows, Microsoft 365, and Surface — ended most Claude Code licenses in June 2026, with token-based billing consuming the annual AI budget ahead of schedule. Anthropic’s own published enterprise figures place average costs at $150 to $250 per developer per month at scale, before any optimization.

Addy Osmani was measured in his original essay: “It’s still early. I’m skeptical, and you absolutely have to be careful about token costs.” The three failure modes he and practitioners flag consistently are: a weak verifier that ships low-quality work with confidence; comprehension debt, where code ships faster than the team can understand it; and cognitive surrender — accepting whatever the loop returns without judgment. A well-designed loop multiplies a good engineer. It multiplies a bad decision at the same speed, with less of the engineer watching.

Starting June 15, 2026, Anthropic formalized the economics: automated workloads through the Agent SDK, claude -p scripts, and Claude Code in GitHub Actions now bill against a separate monthly credit pool — $20 for Pro subscribers, up to $200 for Max — rather than drawing from the same subscription pool as interactive use. When that credit runs out, automated requests stop.

Building a Production-Safe Loop: Four Decisions Before the First Iteration

The difference between a loop that compounds value and one that compounds a billing problem comes down to four decisions made before the first iteration runs.

Define a verifiable success condition. Not “fix the bugs” — “all tests in /tests/unit/ pass with exit code 0 and no new files created outside /src/.” If the condition cannot be expressed in a way a separate evaluator model can check mechanically, the task is not ready for autonomous execution. The /goal command requires this: the completion check runs after every turn on a fast, independent model.

Set a budget before starting. The –max-turns flag caps iterations. Without a cap, a loop running on a vague goal will continue until it hits API hard limits or the monthly credit pool runs out. The Claude Code documentation recommends setting a budget as a production default for any open-ended task.

Separate the maker from the checker. Assign one sub-agent to generate the code and a separate sub-agent to evaluate it against tests, lint output, and specifications. A model that wrote the code and is also asked whether the code is correct consistently over-reports success. The evaluator model in /goal does this automatically; in custom workflows, it requires explicit design.

Use worktrees for parallel work. When multiple sub-agents touch the same repository, file collisions are inevitable without isolation. Claude Code’s isolation: worktree setting spawns a fresh git checkout for each sub-agent that cleans itself up after the run. Omitting this in parallel workflows does not just cause merge conflicts — it introduces unpredictable state where agents overwrite each other’s changes mid-execution.

Where Loop Engineering Stands in June 2026

The June 7, 2026 discussion that crystallized the term — Peter Steinberger’s viral post, Boris Cherny’s widely circulated quote, Addy Osmani’s structuring essay — produced a vocabulary and a set of primitives that now exist as shipping features in both Claude Code and OpenAI Codex. Anthropic’s Claude Code documentation ships first-party /loop, /goal, /schedule, and /workflows commands. The pieces that a year ago required writing and maintaining bash scripts indefinitely now ship inside the products.

What has not changed is that loop engineering is a systems engineering discipline, not an AI skill. The model’s statelessness does not go away with better prompting. Persistent state must live outside the model — in files, in git, in the CLAUDE.md that teaches today’s loop what yesterday’s loop learned. The engineer who moves fastest with loops is not the one who writes the fewest prompts. It is the one who designs the most precise verifier and sets the most specific completion conditions. The leverage moved. The craft did not get easier.

Frequently Asked Questions

What is the difference between /loop and /goal in Claude Code?

/loop runs a prompt on a recurring schedule — every five minutes, every hour, or self-paced based on output — and is useful for recurring maintenance tasks like pull request triage or continuous integration monitoring. /goal is different: it keeps Claude working across turns until a specific completion condition you write becomes true. After each turn, a separate fast evaluator model checks whether the condition holds — the agent that wrote the code is not the one deciding whether it is done. For tasks where you need autonomous execution toward a specific outcome rather than recurring scheduled work, /goal is the correct primitive.

Why do agent loops cost so much more than single prompts?

Every tool call in an agentic loop adds context that gets re-sent to the model on every subsequent call. By the 20th iteration of a loop reading files, the cumulative input per call can exceed 50,000 tokens. At Opus 4.8’s $5 per million input tokens, a single late-loop step costs roughly $0.25 — and a 200-iteration autonomous session on an open-ended task can cost $80 or more. The June 15, 2026 Agent SDK billing change separated automated workloads onto their own credit pool specifically because flat-rate subscriptions were never designed to sustain agent-level compute consumption.

What is the verifier model in /goal, and why does it matter?

When a developer sets a goal in Claude Code, the system uses a separate, faster model to check whether the completion condition is satisfied after each turn. The model that generated the code and the model evaluating whether the work is done are different instances. This separation is architecturally important because a model evaluating its own output consistently over-reports success. The verifier encodes the developer’s standard for “done” — something the code generator cannot reliably supply for itself. Without an independent verifier, a loop can run for many turns producing work that satisfies none of the actual requirements.

Does loop engineering work with tools other than Claude Code?

Yes. The pattern maps almost identically onto OpenAI Codex’s Automations, /goal command (added in Codex command-line interface version 0.128.0), worktrees, and skills features. Addy Osmani noted that once you recognize the shape is identical across tools, you stop debating which agent to use and start designing a loop that works regardless of which one you happen to be running. The primitive names differ by tool; the architectural requirements do not.

Originally Appeared Here

Pages

Categories

Stop Prompting, Start Designing Autonomous Agent Workflows

Why Prompting One Turn at a Time No Longer Scales

How the Agentic Loop Works: Five Stages and a Sixth That Changes Everything

The /goal Command: What a Separate Verifier Model Does

What Claude Code Dynamic Workflows Add to the Architecture

The Token Cost of Autonomous Loops: What the Discourse Underweights

Building a Production-Safe Loop: Four Decisions Before the First Iteration

Where Loop Engineering Stands in June 2026

Frequently Asked Questions

About the Author:

Why Prompting One Turn at a Time No Longer Scales

How the Agentic Loop Works: Five Stages and a Sixth That Changes Everything

The /goal Command: What a Separate Verifier Model Does

What Claude Code Dynamic Workflows Add to the Architecture

The Token Cost of Autonomous Loops: What the Discourse Underweights

Building a Production-Safe Loop: Four Decisions Before the First Iteration

Where Loop Engineering Stands in June 2026

Frequently Asked Questions

You May Also Like

Prompts to Loops: The new jobs of the AI age

I Built a 7-Figure Business Teaching People How to Engineer AI Prompts. My Advice Usually Boils Down to 3 Sentences.

SEO.co Launches Free AI Prompt Library for Marketers, SEO Professionals, and Business Owners

This $60 AI tool lets you compare responses from 20+ models

AI productivity fads, from prompt engineering to tokenmaxxing

How AI’s hottest trend turned into a costly hangover

About the Author: