(© Lerbank-bbk22 – Canva.com)
To say enterprise vendors are a wee bit excited about AI agents would be a massive understatement. Well, I’m here to pour some cold water on this bombast.
- Why do vendors act like AI agents are new, when many have been utilized for years?
- Why are we idealizing the automation of generative AI processes that have not yet been adopted, or proven to work with enough reliability to remove human supervision? Hello?
- Why are we promoting a mechanistic, super-automated version of the enterprise, where making work better for humans (and the customers they serve) seems to be an afterthought?
Agreed: AI agents are very useful. I also agree that we need to shift from AI bot interactions to AI actions – and AI agents have the potential to do that. So what’s the ginormous chip on my shoulder?
Demystifying AI agents – where do we begin?
I believe AI agents are being overhyped this fall because generative AI go-lives are not at the point to demonstrate exciting results.
This is a kick-the-can-down-the-road with shiny-new-object tactic. If you had nothing else to talk about, this would make sense. But vendors do have good things to talk about!
Vendors have terrific customers stories to share right now – stories that involve “classic” deep learning AI, already operating at scale (think predictive maintenance, shop floor robotics, fraud detection, or product recommendation engines). I just featured a story that involved faster recovery from a cybersecurity incident for cloud-based systems, and $30 million+ projected savings via improved process configuration. That seems good enough for a keynote segment to me!
- Why are we talking about AI agents as if we haven’t been successfully automating tasks for years?
- If enterprises haven’t already pushed ahead with full-on process automation, what the heck have they been waiting for? Process automation vendors have been kicking butt for some years now.
What exactly is new and different about this round of AI agents? Why won’t vendors get into the nitty gritty and explain why suddenly AI agents are different, when in fact you’ve been experiencing AI agency for years, every time you summon an Uber, or type in a search query? Until we understand what’s different about today’s AI agents, how can customers possibly evaluate the use cases?
AI agents and LLMs – what’s new and different?
One thing that’s different of course, is the emergence of LLMs. Obviously, you can set up a chat or voice interface to interact with an LLM. In turn, this impacts AI agency:
- A business user could ask for their own task (or process) to be completed, e.g. “file this expense report,” or “provision this new employee.” But the AI agent doesn’t magically automate that process. The AI agent invokes an automation you should have had in place a long time ago. This scenario is a change in UI, not automation. And: if you didn’t have that automation in place, perhaps that’s because the process is too complex for today’s supposedly “reasoning” AI?
- The LLM-driven agent could ask you for some specifics about the employee you want to provision, in a much more user-friendly way, and then go do it. But you’d probably want to quickly review and approve before the provisioning formally happened. That’s useful, but that’s not the autonomous super-agent vibe I am getting from vendors.
- A business user could even request a new automation via the LLM interaction, though in theory setting up new automations would need to be approved by some supervisory human, somewhere.
- You could also automate processes that an LLM has been executing via manual prompts. However, these are the same LLMs that have persistent accuracy problems – problems that can be reduced but not entirely eliminated by current technigues, e.g. RAG.
- One area where generative AI agents can obviously excel: language processing (though we’ve had pretty good NLP for a while). So, for example, more nuanced ability to properly scan invoices, or process variations in service order language. That doesn’t magically remove the need for human supervision though. If anything, generative AI increases that need – an agentic irony for sure.
Next-gen tech – or “I could have just done this myself”?
If AI agents are so seamlessly orchestrating our lives, why are the Reddit threads I see on this topic full of developers navigating serious challenges and calling BS on hype?
This feels like trying over and over to explain the requirements to a junior… Everytime I tried these things in the past I only lost time and eventually had to do it myself. Had I just did it myself from the get go I would have finished it faster. The only thing I can ever get these things to write with success are unit tests for functions that are stateless. And I still need to fix and polish a lot around it.
or:
When you have a couple decades of programming under your belt and can touch type, co-copilot et al are absolutely worthless. Coding is the quick and east part of my job. Dealing with management and SMEs is like pulling teeth.
Two hands-on AI experts, both more optimistic on generative AI than I am, who just issued a white paper on agents, recently wrote:
The North Star of this field is to build assistants like Siri or Alexa and get them to actually work — handle complex tasks, accurately interpret users’ requests, and perform reliably. But this is far from a reality, and even the research direction is fairly new.
Does that sound like the tone we’ve heard from enterprise vendors this fall? I didn’t see this post during a keynote:
I ordered DoorDash on the Rabbit R1. It didn’t let me choose what I wanted on the dish I ordered and had it incorrectly delivered to my house instead of work without notifying me or without any UI to change that.
A rousing success.
— Quinn Nelson (@SnazzyLabs) April 24, 2024
Does that look like an enterprise-ready AI agent workflow to you? Oh, and thanks to agents, we now have the most brutal tech buzzword since DevSecOps: “agentic”.
How do we reconcile ‘agentic AI’ with elevating human impact?
Let’s say AI agents are new, as vendors imply. That means these new agents are tied to generative AI, which is known for its output errors and dependency on accurate, industry-specific data sets to be enterprise-grade. Should we be heavily automating imperfect processes? When it comes to gen AI, shouldn’t we be keeping human supervision in most of these workflows, especially the regulated ones?
Most vendors that promoted the heck out of autonomous AI agents this fall also emphasize their responsible AI design principles, with human approvals and review steps where needed. How are those two views of AI reconciled? Did you see a vendor that successfully reconciled them?
I found one, but not at an event, via a post by Eilon Reshef, CPO of Gong: AI Agents Aren’t Yet Enterprise-Ready—They Need More Human Guidance.
Currently, the risks presented by fully autonomous AI agents and the lack of visibility and predictability make it unappealing for businesses to implement with no oversight.
Gong knows a thing or two about automation; you can be sure if agents could handle all of Gong’s workloads tomorrow, they’d do it. Yet Reshef’s case for agents sounds more grounded.
Oh, and I’ve left off another twist: there are a huge variety of different “agents,” all lumped under the so-called AI agent umbrella. Each one has special considerations before implementing. Should we discuss that perhaps? The agent tutorial on AWS defines these AI agents:
- Simple reflex agents
- Model-based reflex agents
- Goal-based/rule-based agents
- Utility-based agents
- Learning agents
- Hierarchical agents
Velaro defines seven different types of AI agents, completely differently than AWS does. The real world examples of AI agents from Botpress are the most industrially-useful I’ve seen, but this writeup clearly shows that AI agents are not dependent on generative AI, and existed long before gen AI became popularized, e.g. Netflix personalized content recommendations.
That’s not even counting the confusion of providing digital assistance or AI agents in customer support, further blurring the “AI agent” terminology we are attempting to decode. Boomi, for example, offers six different AI agents.
Are customers up to speed on these AI agents I’ve just listed? How they differ, what they are good for, and what they are not good for? Have you heard a single vendor say what AI agents are not good for, besides walking the dog? Let’s find out, shall we?
Boomi CEO Steve Lucas: today’s AI agents are different than RPA
That’s a full plate of questions to unravel, and it will take me a while to unravel them. I’m still researching the heck out of this, but since I have yet to validate AI agent success with a customer, there is a long road ahead. Why not come up for air, and get a gut check from a couple of willing (and very patient) vendors who don’t mind the hot seat?
I’ve known Boomi CEO Steve Lucas for a longish time; he’s been in my hot seat before. Recently, Mark Samuels issued an overview of Boomi’s agent approach on diginomica, so I’ll skip to the burning questions.
What’s new and different about AI agents, Steve? Lucas argues that traditional RPA, while useful for some things, is too “static” for many of today’s automation needs. LLM-driven agents can help, because of the business context they provide:
The reality is RPA static processes, they do not have context for your business. But now I can take a Large Language Model, and I can give it context for my business. Now that could be in the form of a prompt, or it could be in the form of long-term memory. So if I give it context on the left hand and on the right hand, it’s good at more than just rules; it’s good at abstract. Why wouldn’t software evolve in that direction?
Fair points. But LLMs, even grounded with RAG and vector databases (or knowledge graphs), are not perfect in their output. Unlike RPA, LLMs are probabilistic technologies. So why would an enterprise risk automating imperfect processes, even with that rich business context? Lucas responded:
It’s a really good problem… In any scenario, you called it out already. RAG for me, is a must, because hallucination is a fact. It will exist for a long, long time. But that being said, as models get better, the potential for hallucination, that 5% hallucination all the way down to one. It’s never going to get to zero, right? It’s just going to infinitely approach zero, but it will be remarkably closer years from now.
I’m with Lucas on why RAG is a must for more accurate gen AI output. I don’t agree we will get anywhere near zero – if we follow the current tragectory of a heavy-duty, albeit effective bandaid on LLMs via RAG. I think we’ll need a breakthrough that actually helps these machines understand not just context, but meaning. (There is a caveat with smaller models I’ll get to shortly). At any rate, that debate can wait. Meanwhile, Lucas checks another box I was looking for. He cites use cases where agents don’t work well, at least out of the box.
Saying ‘Hey, I’m going to use this to do Expense Report approval with no Retrieval Augmented Generation,’ that’s a terrible idea. So that’s the second piece, which you’ve already called out… What’s missing is this applicability index, by industry and by line of business – where does the [AI agent] heatmap go?
I heard about an automatic dispute resolution AI agent the other day. Routine invoice discrepancies? Perhaps. But actual payment disputes with customers? How many ways could that go wrong? Moving along, Lucas makes the case for support agents:
If I can reduce my cost by 90%, but the trade off is, it’s only 90% accurate – I’ll take that all day.
I’d argue that depends on what the 10% inaccuracies look like. If those inaccuracies are just misunderstanding the question, then fine – humans do that too. But if the bot advises, say, a dangerous battery removal procedure, then that’s a costly, liability-inducing inaccuracy. But yes, there is a margin of error for first tier, automated customer support. But even in that scenario, for most industries, I’m much more sold on providing digital assistants to make support better, rather than completely eliminating human support. Lucas contrasts that with medicine:
If I’ve got a dude with a knife and I’m about to go into surgery, am I going to accept 90% accurate? No.
Lucas on documentation agents: if we can get to 95% accuracy, why would we ever use a human?
Lucas says a big part of Boomi’s current AI agent projects involve what you might call less risky areas, when it comes to accuracy tolerance:
Probably two thirds of all use cases we get pulled into right now are support, customer service, customer success, what one could characterize as chatbot.
But Lucas says it won’t end there:
We’re getting pulled into things right now, like, ‘Hey, could I get a corporate buyer agent for me this grocery store, retail chain that outperforms my human buyers, because human buyers tend to buy things that they loved as a child versus what the data tells them. That’s highly applicable, right? So I think we’re going to see things like buyer agents and procurement agents, things like that.
Yes, but: if your buyer agent makes nostalgic childhood purchases and ignores your data, isn’t that a human competence or training problem? Are we going to let AI agents take over such crucial roles? I suspect we’ll use the AI agent, in that case, to contrast with the human agent’s plan, and perhaps make adjustments based on the contrast. That’s useful and could even lead to ROI, but to me, that’s far more interesting (and likely) than “we fired all our purchasing agents.” On the other hand, if the so-called purchasing agent is just ordering the cheapest bulk toilet paper, then yes, here come the AI agents.
I’ve taken the semi-unpopular position that AI can help mitigate human bias, so I’m with Lucas on that. However, AI systems have biases too (often tracing back to human-related biases in the training data, but still…), which is why trading out a human worker for an AI could just amount to swapping biases. Whereas the two working together has the potential to mitigate each other’s shortcomings, while bringing different strengths. This is especially useful in junior level roles, where human mentoring may be scarce.
Lucas says the most popular agent right now is their documentation agent, Boomi Scribe. I can see why. I’d still personally want a human domain expert reviewing that documentation, but it’s a heck of a chore to create it from scratch – and generative AI’s writing capabilities are more than sufficient for this. Lucas:
I’ll tell you how far the concept goes… Today, I think it is unrealistic for a CIO to say to the average worker, ‘Please document this process. Make sure it’s compliant with GDPR, with the California Consumer Privacy Act, and, oh, by the way, make sure, if it ever leaves the US, that it doesn’t break any law.’ That’s crazy talk, but that’s the process design and documentation that is required, especially if you’re a multinational corporation, right? So the layers we’re going through is: right now we’re just writing documentation, but we will drive industry subject matter expertise, legal compliance expertise into that documentation agent.
Lucas says the small/acceptable margin for error makes this type of AI agent a no-brainer:
What is the margin of error? Let’s say it’s 5% in terms of documentation. If our documentation agent has a less than 5% margin of error, then, even though your risk call out is accurate, why would you ever use a human? I don’t want to be the AI-crazy guy that’s like, ‘Never use humans.’ But if the margin of error for documentation is less than a human’s why would you use a human?
Score this one for Mr. Lucas: AI-generated documentation looks like a solid use case, with an autonomous agents to keep that documentation up to date. And, I would hope, alerting the human domain expert when a significant update has been made – especially when regulatory updates are in play.
It’s true that RAG and other “grounding” techniques reduce output error, but that brings its own set of dangers. Let’s say you get your AI output accuracy rate on legal documentation (or police reports, yikes) to 95% or even a bit higher. In turn, don’t we run a risk that humans will start to rubber stamp these approvals, since many of them are accurate? What is we miss one that has real human fallout? Shouldn’t we bear down on the consequences, given some of these (unregulated) scenarios are live in production now?
Don’t we also run a risk that management “rightsizes” the humans prematurely, based on all the AI agent happy talk they are hearing? Or, that they turn AI agents into a brute force headcount reduction tool, regardless of how effective they are for the scenario envisioned? Thanks to Lucas for this debate, and I look forward to documenting Boomi’s customer results.
Zapata AI – agentic AI results, without LLMs?
Smaller models can miss out on some of the characteristics of LLMs, but smaller models can also be better grounded for a specific task, and thus even better accuracy. That’s a core principle of Zapata AI’s “Industrial AI” approach.
Steve Lucas juxtaposed rigid RPAs with LLMs that provide context, but there are other ways to provide context to agents besides LLMs. Some of those ways are more accurate than LLMs, and more bulletproof. That’s exactly what Zapata AI CEO Christopher Savoie contends – to the point where Zapata AI’s solutions don’t even use LLMs for their generative models (though you can query those results with LLMs). Yep – generative AI without LLMs. And they already have big go-lives, such a fascinating BMW plant scheduling scenario, etc. When I talked AI agents with Savoie, he referred to an Andretti Global race strategy example – and it was a very different view of the agent/human mix. Savoie says that in the case of Andretti racing, there are five companion agents in play:
We have five agents, if you will, five models working on the prediction of a yellow flag, which is the caution and accident flag. We have one agent that’s trained to look at things one lap ahead. We have a second agent that’s trained to look at things two laps ahead, three laps ahead, and five up to five. Each have different parameters that draw things out. When you’re looking at something five laps ahead, you’re looking at different signals in the data than you are at one lap from now… Each of the agents is giving different information, but it’s using all of those different outputs from those agents to come up with an amalgamated picture of: how is risk forming on the track?
But this doesn’t replace the human race strategist:
Even with the the information we’re doing with the Andretti example, we’re not trying to get rid of the race strategist. We’re trying to assist the engineer and give them more possible design possibilities. We’re trying to enhance their ability to be creative. That’s really what they generate.
Now that’s the kind of AI agent I’d want to have in my workplace.
My take
I’m not against automation; I’ve already pasted companies for not stepping into process automation sooner. I am, however, opposed to automation that doesn’t have a companion narrative around how human workers can create more value – and maybe, even, gasp – experience some personal fulfillment on the job, and put some real ingenuity into it. Oh, I forgot – we’re automating that also.
I do find the productivity-based conversation limited – nor has increasing brute force productivity with AI proven as easy as promised.
I recently saw a few posts elevating Amazon as a process automation role model, with large portions of logistics and customer service already automated. Much of this is some type of algorithmic/agentic AI, with generative AI only a limited aspect of what Amazon does (it’s not even available on Alexa devices yet, for example). But if Amazon is the role model for humans alongside automation, you can count me out. Their shop floor KPI dystopia (“I need to load 1,800 packages per hour”) is rivaled only by their last-mile gig economy parcel delivery – ‘as long I can mark it delivered to make my quota’, filing legal briefs due to (allegedly) peeing in bottles to keep up with their KPIs, etc.
Amazon would like nothing more than to automate the rest of all that and move those humans out, but that’s not happening anytime soon. Is that the corporate automation role model to aspire to? I believe the bar is higher. Successfully competing against Amazon will require both savvy humans, and effective AI agents.
Vendors are over-selling AI to customers that are ready to seriously evaluate – so we can chill on the sales pitch hyberbole. But to get there, customers also need a more intense AI readiness conversation, including data quality (see my mid-year gen AI project assessment). I would even apply this to Boomi. Though Boomi’s agentic AI framework is interesting, I found Lucas’ keynote messaging around the problem companies face with digital fragmentation more compelling, and, as Lucas himself points out, you’re not getting to good AI enterprise AI if you don’t solve that data fragmentation problem.
Perhaps I’m wrong, and we’ll all have autonomous agents doing the heavy lifting in our white collar lives in just a few months time, operating in perfect, error-free synchronization. But if I’m wrong, that also means a good percentage of us won’t have jobs anymore. So be it – I’m calling in this hand. The dealer isn’t bluffing, but: I don’t think all the cards have been reckoned with yet.
Lucas assured me he’s hearing from customers who want to build agents, and he’s hearing from them all the time. But I think customers are also interested in much more candid conversations than we’ve seen on the keynote stage this fall.
The best view of an AI agent, to me, is the ability to autonomously orchestrate processes that have already been successfully and reliably automated, and: to involve business users in designing and building agents of their own. Then, let’s see how humans emerge into more impactful roles. This is the strategic-versus-admin fight we’ve all be been fighting, and have been. The role of AI agents could be pretty cool and useful, but aside from that last user engagement part, most of it is not new.
Having said that, as Zapata AI’s Savoie has pointed out, smaller AI models are pretty darn good at handling one task or process step, handing on to the next model, and so on. How much of this is new and different, because of generative AI? The search for clarity continues. Vendors could help a lot here, by pro-actively anticipating these questions. If they don’t, don’t worry faithful reader, we’ll keep the seat as warm as we can.
End note: thanks to Boomi CTO Matt McLarty, who provided useful context to this article.