
Stu Sjouwerman is Founder and Executive Chairman of KnowBe4 Inc., a security awareness training and simulated phishing platform.
Large Language Models (LLMs) seem to be everywhere now. Chatbots, coding assistants and research all promise transformative efficiency. Yet too many businesses discover an inconvenient truth: asking an LLM a simple question often yields unreliable results. This is because interacting with these powerful tools is fundamentally engineering, not casual conversation. Misunderstand this, and you risk AI hallucinations, biases, or worse, leaking sensitive data.
Prompt engineering is a critical layer of human risk management in AI systems. By crafting clear, context-rich instructions and implementing security-focused protocols (like input sanitization and strict output formatting), humans can directly mitigate key operational threats.
The Token Prediction Engine
Think of an LLM not as an oracle, but as a sophisticated token-prediction engine. It devours text, breaks it into chunks (“tokens”), and calculates the most probable next token based on its vast training. Tokens are the fundamental units of text that an LLM processes. For example, the word “unhappily” might split into three tokens: “un,” “happi,” and “ly.” This matters because LLMs see text as sequences of tokens, not whole words.
A prompt is the actual input steering that probability calculation. Garbage in, garbage out—or, in the enterprise context, insecure prompts in, disastrous outputs out.
The Four Pillars Of Effective Prompts
Drawing from best practices like Google’s Gemini framework, we see core pillars emerge:
1. Define the role.
Who is the LLM supposed to be? Forget vague instructions. Explicitly state: “You are an experienced security analyst reviewing incident reports.” This primes the model’s knowledge base and tone, reducing generic responses. A persona isn’t a suggestion; it’s an instruction.
2. Clarify the task.
What exactly does it need to do? Ambiguity is the enemy. Compare “Summarize this” to “Summarize the key security risks identified in the attached incident report, focusing on potential root causes and recommended immediate actions, in under 150 words.” Specificity slashes hallucination risk. You want to include quantifiable metrics like units, ranges and objectives.
3. Provide context.
Be sure to ground the model in reality. Feed it the necessary background–relevant data snippets, key acronyms your company uses and critical constraints (“Do not access external websites”). Without context, the model may end up doing too much guesswork. Using retrieval-augmented generation (RAG)—appending relevant external data to the prompt—is ideal since it anchors outputs in facts.
4. Dictate the format.
How should the answer look? Need a bullet list for an executive summary? How many subheads? A table comparing options? What kind of table? Help the model help you. Provide an example output structure to reduce parsing errors downstream and prevent messy, unstructured replies.
Security: Built Through Design
Careful prompt design is important for avoiding potential security risks. If your prompt is too long, exceeding the LLM’s token limit, the model might just cut off the end, possibly ignoring the security rules you included.
A more serious threat is prompt injection, where attackers craft malicious inputs designed to trick the LLM into bypassing its core safety protocols, manipulating its responses or leaking sensitive data. To prevent these attacks, validate all inputs provided to the model and avoid granting the LLM’s overly broad permissions within the prompt itself.
Crafting Instructions: The Art Of Clarity
Different prompting techniques serve distinct purposes. Straightforward queries—simply asking direct questions without examples—often succeed. More complex reasoning tasks call for more guidance. For intricate problems involving logical deduction or multi-step analysis, chain-of-thought (CoT) techniques can improve outputs. Explicitly instructing the model to “think step by step before providing your final answer” not only boosts accuracy but also exposes flawed reasoning paths. To further stabilize outputs, running multiple CoT reasoning chains and selecting the most frequent answer can reduce erratic responses.
Avoiding Predictable Pitfalls
The pitfalls in prompt engineering can carry serious consequences if ignored. Vague instructions breed inconsistent interpretations, while insufficient contextual grounding invites the model to fabricate plausible but false details. One critical error lies in anthropomorphizing the model, assuming it intuitively “understands” context or intent like a human collaborator. This necessitates extensive testing: iteratively refining phrasings, rotating examples to avoid bias drift and continuously monitoring outputs for subtle inconsistencies or security lapses.
Effective prompt engineering is never truly finished; it’s more of a continuous practice, study and refinement.
Unleashing true LLM value while managing risk requires moving beyond simple queries. It demands deliberate prompt engineering. Proficiency in defining roles, specifying tasks, dictating formats and placing human oversight at the core of risk mitigation can transform LLMs from unpredictable novelties into reliable, secure engines of productivity.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?