Prompt engineering is key to create effective LLM-based applications and it does not require to have a PhD in machine learning or generative AI, say GitHub engineers Albert Ziegler and John Berryman, who also shared the lessons they learned developing GitHub Copilot.
The rise of LLMs has created a complete new field for practitioners interested in leveraging generative AI in their applications. Known as prompt engineering, it focuses on how to instruct an LLM to produce an output that is not part of its pre-training. To this aim, prompt engineering enables defining ways to craft prompts that include enough context for the LLM to produce the best possible output.
Relevant context information is found in the user domain and should be included in the prompt along with the sheer task specification, which exists in the rather unspecific document domain where the LLM is just a sort of next token predictor. Without a proper mapping between the two domains, e.g., by specifying in the prompt that the response should be generated as “a helpful IT expert”, the response could turn out to be overly generic.
In Copilot’s case, useful context information may include the language, the file path, text above the cursor, text below the cursor, text in other files, and more, say Ziegler and Berryman.
Converting between the user domain and document domain is the realm of prompt engineering—and since we’ve been working on GitHub Copilot for over two years, we’ve started to identify some patterns in the process.
In summary, the approach they suggest is based on a sequence of steps. First off, you want to collect all relevant context (context gathering), which could include whole source files. In many cases this context will outgrow the available LLM window, so you will need to kind of optimize it by way of snippeting it into smaller non-overlapping chunks. The next two phases in the pipeline are finding a natural way to inject context information in the LLM document, e.g., using code comments in Copilot cases, and prioritizing the snippet to include based on their relevance. An additional phase, when you can choose among multiple LLM models, is to decide which model to use for inference. A final step is defining a stop criteria so the LLM knows when it is done, e.g., when a line break is output.
There are multiple approaches to prompt engineering. Recently Microsoft open sourced its LMOps toolkit, which includes Promptist, a tool to optimize a user’s text input for text-to-image generation, and Structured prompting, a technique for including more examples in a few-shot learning prompt for text generation.
While it can be speculated that LLMs will evolve up to a point where prompt engineering is not required anymore, OpenAI’s tech staff member Sherwin Wu noted at the last QCon New York “LLM in Production” panel that for at least five years it will be likely needed.
If you are interested in GitHub’s approach to prompt engineering, do not miss the full article, which covers much more detail than what can be summarized here.