AI Made Friendly HERE

Time to get real about RAG and enterprise-grade AI

PR pros don’t get enough credit. They fire off enthusiastic interview offers – only to be rewarded with a grouchy rebuttal if the pitch pushes my buttons. 

The most reliable way to push those buttons? Send me something hyperbolic about AI – or trigger me with a tech assertion that doesn’t hold up to my research. 

The truly skilled PR pros take these responses in stride, and “circle back” with clarifications from subject matter experts. 

In this edition, we’ll look at the most successful RAG-related pitches I’ve received – and what they reveal. 

Of course, in the so-called “agentic AI” era, RAG is hardly the only tool (or data source) an agent might call. But, Retrieval Augmented Generation is still an important component for more accurate enterprise AI. RAG brings context engineering into focus – and so-called context engineering, if you can handle another fun AI buzzword,  is a big part of how LLMs are given the information needed for quality responses – and proper actions. On to the pitches… 

Pitch #1 – exploring the limits of RAG with Camshaft AI

This pitch was on: “the revolution of RAG systems and how they’re being tailored to real-world business use cases”

Hi Jon – I’m reaching out to offer a contributed article or interview opportunity on how businesses are moving beyond the AI hype to implement solutions that deliver real, measurable value—centered on a powerful example: Camshaft AI, a proprietary Retrieval-Augmented Generation (RAG) system developed by technology consultancy Classy Llama.

I must have been in a particularly cranky mood that day – maybe the hotel coffeemaker was broken, because that pitch wasn’t too over-the-top. However, it led to my terse response: “RAG is pretty useful but unfortunately, at times LLMs will still choose to disregard RAG-provided context, so I’m reluctant to overhype RAG too much.” The spunky PR team got me a quick response from Greg Tull at Classy Llama: 

You’re absolutely right—RAG isn’t magic, and it’s important not to overpromise. While of course RAG significantly improves the relevance of AI outputs by grounding them in your actual data, it’s still built on top of an LLM, which has its own quirks. That’s why we don’t just drop in the tech and walk away.

A fair statement. Tull continued: 

We customize the retrieval layer, tune the prompts, and test rigorously to make sure the LLM is actually using the context it’s given. And when it doesn’t—we catch it. Our job isn’t just to implement RAG; it’s to engineer reliable, task-specific results.

Thumbs up for auditable AI systems, and real-time monitoring. I sent the PR team a link to my RAG and agentic metrics article, and asked how that stacks up. Tull responded: 

We agree with the central premise of the diginomica article: trust in AI systems cannot be assumed — it must be earned through design, transparency, and consistent results. At Classy Llama our work starts from the same position of respect for user skepticism and a high bar for output fidelity. We also emphasize that data management is fundamental to RAG accuracy; really AI accuracy as a whole. In publicly trained models like ChatGPT, you can’t ensure data accuracy in the same way.

The amount of source data for inputs is too massive. In the far more constrained context of RAG, especially private corpus, something much closer to proper guardrails can be established. A core component of proper guardrails IS your source data – RAG can only answer correctly if the source material is correct. We know this feels like an “of course” observation, but we don’t feel like it gets enough focus in many users’ approach.

We’re far from done, but the following is how we’ve approached solving it so far. Also note the critical human component required throughout.

I don’t often hear from vendors that they “respect user skepticism.” I like that as a starting point, versus force-feeding users to adopt imperfect/potent AI systems. These systems can improve – sometimes a great deal – via customer-specific data and iteration. But it sure helps when your users are considered part of the forward plan, and not an inconvenient change obstacle that needs to immediately trust AI output of varying quality.

Components of a reliable RAG architecture – a summary view

Tull provided specifics on how Classy Llama’s RAG architecture works. As I see it, these are important aspects of any RAG architecture: 

  1. Get “chunking” right – Good RAG output depends on savvy “chunking,” e.g. providing the LLM with the most relevant context. Tull called  this “Chunk Level Precision.”

We label certain spaces with tags, and group datasets by relevance/topic/access. This doesn’t ensure a “home run”, but it does make sure we get in the right ballpark. Labelled chunks serve as a measurement for accuracy – is this source material appropriate to the context of the users query?

2. User “memory” provides relevant context – Classy Llama calls this feature “Recall.”

Camshaft takes into context every component of the dialogue thus far before it answers any further prompt. I.e., what have we said, what responses has it pulled, what is the context of the current conversation; etc. This is a strong guiding hand toward context accuracy and response precision, and of course we consider it table stakes.

3. Audits with human review – and source links for explainability – “Task-based outcome audits with human review” – yes, that’s definitely necessary. One of the benefits of a well-design RAG setup? Better linking to source documents. Here’s how Camshaft AI explains this:

User training and citation/source provision are key to enable this. We don’t want Camshaft to just answer a question. We want the user to be able to double check, to keep it accountable. If the user asks “X” in a query, Camshaft comes back with sources, the user now checks those sources to see “Yes, that is the correct documentation, and that is the correct answer based on what’s recorded in sourced documentation.

4. LLMs shouldn’t ignore the prompt/RAG context – nor should they hallucinate. There are very few enterprise use cases where creative “hallucination” is a desired attribute. How you architect against that matters. Tull says that “RAG-only mode” helps to deal with this: 

RAG-Only Mode: strictly enforced constraints. 1. The model may not “hallucinate” or draw from its own pre-training. 2. All information used in the generation must come from retrieved context. If the answer is not present in the retrieved text, the model must either say “I don’t know,” or avoid answering the question altogether.

Tull also shared how his team uses RAG to build user trust. In my view, RAG can indeed earn greater user trust of AI systems, by providing (and citing) more relevant data sources). But that doesn’t mean you can drop this tool in a user’s lap. Tull explains: 

User onboarding is so critical. RAG is not something you can turn an untrained user loose on and expect them to handle it well. They need to be familiar with AI generally, LLMs notably, and RAG specifically. You must train them how to ask questions/how prompts must be structured, that answers may be wrong, and that human intervention/oversight is still necessary to ensure accuracy. RAG doesn’t magically answer all your questions – it helps you get to an answer much faster, while still being prone to making mistakes much like a human might.

Pitch #2 – Are RAG-based AI solutions a security problem? 

As per synthetic data management vendor K2view: 

Enterprises are questioning whether RAG-based AI systems secure data or turn it into a security problem. Bloomberg’s latest research reveals RAG-based AI models bypassing built-in guardrails, answering malicious queries even when the retrieved documents themselves are safe. For enterprises looking to reap the benefits of RAG-based AI systems, they have to first secure data at the source.

A good problem statement. So what’s the solution? 

According to Iris Zarecki at K2view, who assists major clients like Verizon in leveraging their enterprise data to train LLMs (think: customer statements and billing contracts), this means enterprises need to retrieve high-quality, fresh data to share with the LLM in split seconds – which is no easy feat – otherwise, they risk sensitive data leaking to the wrong model, or even worse, an unauthorized user. 

A sensible pitch, but I pushed back on the “the full potential of RAG-based systems” assertions. I replied: “An important issue: is your subject also aware the LLMs choose to disregard the RAG offered context on a pretty frequent basis, even with prompt engineering?” I heard back quickly: 

Yes, K2view is aware of this issue and is using techniques like chain-of-thought prompting and reasoning to minimize the impact.

Chain-of-thought is an interesting technique, but I don’t consider it a cure-all for these issues. I pushed back some more, and got this: 

Appreciate this feedback, I shared it with K2view as well.

Would you be open to a discussion with K2view if they were able to offer a more holistic perspective on RAG? Like the benefits/challenges and how it compares to fine-tuning or prompt engineering? Let me know, if you’re interested in a tangential topic, they’re also keen on speaking about MCP, and how it compares to the other open components – A2A, Kafka and Flink. LOKA is also making headlines on joining these ranks.

That, folks, is skillful PR – take my nitpicks in stride, and tempt me with your expertise in emerging agentic protocols. That interview will happen; I’ll report back. I was also intrigued by this note from K2view: “Iris believes a new approach to data security is needed, with a semantic data layer optimized for GenAI and dynamic agents to serve data to GenAI apps.” This fits into one of my main areas of research: what is the most effective “data layer,” if you will, to support AI applications? Cloud? Edge? Real-time? Lakehouse? Zero copy? And how do you get there? 

In part two of my pitch review, I’ll share a couple agentic AI pitches, and a CX study that interested me way more than I thought it would. I will also provide a bit more info on the “context engineering” buzzword festival, and why it raises important points on the limitations of retrieval, and prompt engineering. Until then… 

Originally Appeared Here

You May Also Like

About the Author:

Early Bird