Context engineering visualization – VectorMine // Shutterstock
Why context engineering matters more than prompt engineering
Your AI agent knows your product inside-out. You’ve written detailed prompts, uploaded your docs, and tested it dozens of times. Then a customer asks about pricing, and the agent quotes last quarter’s rates. Or it recommends a demo to someone who’s been paying you for three years. Or it cheerfully offers a discount code that expired two weeks ago.
The frustrating part: The correct information was right there in your prompt. Line 347, clearly stated. The AI just ignored it.
Researchers have a name for this: “lost in the middle.” LLMs exhibit a U-shaped attention curve, processing information at the beginning and end of inputs reliably, while performance drops by more than 30% for anything buried in the middle. That carefully crafted rule you added after your last customer complaint? Good chance the model never sees it.
Wharton’s Generative AI Labs found something similar when they tested prompts the way scientists test hypotheses, running each question 100 times instead of once. At strict accuracy levels, most conditions “barely outperform random guessing.” The outputs looked good individually. They just weren’t reliable.
Your prompt is probably fine. But prompt engineering has become one layer in a larger stack, and teams building AI that actually works in production now spend equal or more time on what surrounds the prompt: the context. Here, Zapier explains why context engineering matters even more.
What is context engineering?
The LangChain team paraphrases Andrej Karpathy, who helped build Tesla’s AI and co-founded OpenAI: “LLMs are like a new kind of operating system. The LLM is the CPU, the context window is the RAM.”
I’ve found a simpler way to explain this to my marketing friends who glaze over at “tokens” and “inference.”
Your AI is an employee. The context window is their desk. Whatever’s on the desk right now—the customer file on the screen, the campaign brief they printed out, the brand guide shoved in a drawer—that’s what they can work with. Desks fill up. So does an AI’s memory.
Your prompt is the sticky note you handed them this morning. Important, sure. But the stack of folders on their desk? The CRM they have open? That’s what determines whether they actually help the customer or fumble around asking questions the customer already answered.
Anthropic wrote up their approach to context engineering in 2025. Their definition: “the set of strategies for curating and maintaining the optimal set of tokens during LLM inference.” Translation: Give your AI the right information, in the right format, at the right time. Not more. Just what it needs.
The evolution: From chatbots to context-aware agents
Before diving into the how, let’s understand where we are in the AI adoption curve.
-
Phase 1 was copy-paste ChatGPT. Marketers discovered they could paste customer emails into a chat window and get draft replies. Exciting, but every session started from zero.
-
Phase 2 was custom GPTs and assistants. You could pre-load instructions and documents. Better. But the context was frozen. No live connection to what was happening in your business.
-
Phase 3 is agentic AI. Agents that take actions, not just generate text. They update your CRM, create tickets, send emails, and make decisions. This power requires a new discipline: You can’t give an agent instructions and hope. You have to architect its knowledge.
Most people are stuck in Phase 1 or 2. The ones pulling ahead are building for Phase 3.
The context gap: Why your prompts aren’t working
When an AI misbehaves, the instinct is to add more rules. The prompt grows. 200 lines. 400 lines. 500+. More instructions should mean better behavior.
It doesn’t work that way.
The middle gets ignored
Chroma’s 2025 “Context Rot” study goes even further than the Stanford research. They tested a bunch of models and found that “models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows.” They discovered this while studying agent learning: on multi-turn conversations where the whole window gets passed in, the token count explodes and instructions clearly present in context get ignored anyway.
Static vs. live information
A mega-prompt is frozen. You wrote it last month. Since then, pricing changed, a customer opened a support ticket, Marketing launched a new campaign this morning. The prompt doesn’t know any of that. It can only contain what existed when you wrote it.
Context window limits
Every model has a context window. Some of them are in the millions—big numbers. But research shows most models start getting unreliable well before they hit those limits.
In practice, your mega-prompt takes up space, conversation history piles on, and documents get loaded. It adds up fast. Hit the limit and older information gets pushed out. The AI forgets things it knew ten minutes ago.
Manus, an AI agent company, found that their agents consume about 100 input tokens for every 1 output token. On a complex task with ~50 tool calls, that’s roughly 50,000 tokens of context being processed just to generate 500 tokens of response. Most of that context is tool outputs, conversation history, and retrieved documents piling up.
How context fails
Context doesn’t just run out. It fails in specific, predictable ways. Business application strategist Drew Breunig identified four failure modes worth knowing:
-
Context poisoning: An early error or hallucination gets into context and compounds. The AI references incorrect information repeatedly because it’s “in the record.” Once context is poisoned, each subsequent decision builds on the mistake.
-
Context distraction: Irrelevant information drowns out relevant information. You loaded ten documents, but only one matters for this question. The AI attends to everything, relevant or not.
-
Context confusion: The model can’t figure out which pieces of context apply to the current situation. You have pricing rules for enterprise and SMB customers in the same context. The AI mixes them up.
-
Context clash: Contradictory information exists in context. Last month’s pricing and this month’s pricing are both there. Old campaign rules and new ones. The AI has to pick, and it might pick wrong.
When your agent misbehaves, these categories help diagnose the problem. Is it poisoning (bad data got in early)? Distraction (too much irrelevant stuff)? Confusion (can’t tell what applies)? Or clash (contradictory info)?
What AI actually needs: Context engineering vs. prompt engineering
Most people treat AI like a very literal employee and try to fix problems with more instructions. More rules, more examples, more edge cases. But the issue runs deeper: AI needs better information architecture, not just better wording.
Prompt engineering asks “what should I say?” Context engineering asks “what should I know?” Prompts are static, written once, frozen. Context is live, pulled in real-time based on who’s asking and what they need. A prompt is text. Context is a data system.
Think about onboarding a new hire. You don’t hand them a 50-page handbook and say “memorize this before every call.” You give them CRM access, point them to the knowledge base, and share the style guide with real examples. You make sure they know what campaign is running this week and which situations to escalate.
Context engineering does the same thing for AI.
An infographic listing the 4 strategies for context engineering. – Zapier
The 4 strategies
LangChain’s framework breaks context engineering into four strategies. This is useful for thinking about what your system actually needs:
-
Write: Give the AI a place to save information outside its main memory. Scratchpads, notes, files. This way, it doesn’t have to keep everything in its head.
-
Select: Pull in only what’s relevant. Not all your docs, just the ones that matter for this question. Not every customer field, just the ones that help right now.
-
Compress: Summarize when context gets long. A conversation that’s been going for 20 turns doesn’t need all 20 turns in full. Keep the key points, trim the rest.
-
Isolate: Split complex tasks across multiple agents with separate contexts. One agent researches, another writes, and a third reviews. Each has a clean, focused context instead of one agent drowning in everything.
Context is finite. It has diminishing returns. Anthropic’s engineering team puts it well: “good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome.”
Recent academic research confirms this: Strategic selection of relevant information consistently outperforms dumping in everything you have. More context isn’t always better context.
There’s a difference between what you store and what the model sees. Your database can hold terabytes. But at any given moment, the AI should only see what matters for this conversation.
Don’t dump information upfront. Let the AI reach for it. Customer details when a customer asks. Product specs when features come up. Not everything, all the time.
And the one that took me longest to learn: 500 tokens of the right stuff beats 50,000 tokens of everything you have.
3 types of context that matter
To build agents that actually work, you need three types of context. Think of them as databases your AI always has access to.
An infographic listing the types of context for AI agents. – Zapier
Brand context: who you are
This is your AI’s personality. The rules, voice, and boundaries that make responses sound like you instead of generic ChatGPT.
Most marketers miss something here: You cannot invent a brand persona by writing creative prompts. Research shows LLM-generated personas contain systematic biases: positivity bias, idealized profiles, and skewed viewpoints.
So extract, don’t invent. Take your best-performing emails, your highest-rated support responses, your most-shared social posts. Feed those to the AI as examples. Let brand context come from what you’ve already done well, not what you imagine your brand should sound like.
Brand context includes voice guidelines (“direct and confident, not salesy”), anti-patterns (“never say synergy, ever”), approved terminology (especially product names people get wrong), and topics that are off-limits like competitor names or unannounced features. Also, add ten to twenty real responses that nailed the tone, plus escalation rules: when to hand off to a human, what promises the AI should never make.
Customer context: who they are
This one changes with every conversation.
Without customer context, every interaction starts from zero. The AI asks “What industry are you in?” when the customer already told you twice. With customer context, your AI can say: “Last time we spoke, you were evaluating our API integration. Did you get a chance to review the documentation I sent?”
That sentence requires memory. Memory is what separates an assistant from a chatbot that makes customers repeat themselves. What goes in here?
-
Company info and account tier (so it knows whether to pitch enterprise features)
-
Industry (so it uses relevant examples)
-
Whether they have open tickets (so it doesn’t cheerfully ask “how can I help?” when they’re mid-crisis)
-
Purchase history and past conversations (so they never have to repeat themselves)
-
Where they are in the funnel (so the AI adjusts how deep to go on product details)
Strategic context: what you’re trying to achieve
Your AI doesn’t know it’s Q1. It doesn’t know you’re pushing annual plans or that your goal this quarter is demos, not free trials. Unless you tell it.
This layer holds:
-
Current campaigns (so the agent knows what pricing benefits to mention)
-
Active offers (so it knows which discounts are real)
-
Rules for different funnel stages, your conversion goals
The kind of stuff that changes quarter to quarter and shapes what the AI should actually be pushing.
How they work together
A customer asks: “What makes you different from [Competitor]?”
Brand context says never name competitors directly. Customer context shows they’re an enterprise trial user in fintech. Strategic context indicates the current push is compliance features.
Result: a response highlighting compliance capabilities (relevant to fintech), mentioning enterprise-grade security (relevant to their tier), positioning against competitors without naming them. All in your brand voice.
No prompt engineering trick achieves this. It requires architecture.
This story was produced by Zapier and reviewed and distributed by Stacker.
