Arize AI wants to improve enterprise LLMs with ‘Prompt Playground,’ new data analysis tools

Head over to our on-demand library to view sessions from VB Transform 2023. Register Here

We all know enterprises are racing at varying speeds to analyze and reap the benefits of generative AI — ideally in a smart, secure, and cost-effective way. Survey after survey over the last year has shown this.

But once an organization identifies a large language model (LLM) or several they wish to use, the hard work is far from over. In fact, deploying the LLM for an organization in a beneficial way requires understanding the best prompts employees or customers can use to generate helpful results, otherwise it’s pretty much worthless, as well as what data to include in those prompts from the organization or user.

“You can’t just take a Twitter demo [of an LLM] and put it into the real world,” said Aparna Dhinakaran, Co-Founder and Chief Product Officer of Arize AI, in an exclusive video interview with VentureBeat. “It’s actually going to fail. And so how do you know where it fails? And how do you know what to improve? That’s what we focus on “

Three-year-old business-to-business (B2) machine learning software provider Arize AI would know, as it has since day one been focused on making AI more observable (less technical and more understandable) to organizations.

Event

VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.

Today, the VB Transform award-winning company announced industry-first capabilities for optimizing the performance of LLMs deployed by enterprises at Google’s Cloud Next 23 conference, including a new “Prompt Playground” for selecting between and iterating on stored prompts designed for enterprises, and a new retrieval augmented generation (RAG) workflow to help organizations understand what data of theirs would be more helpful to include in an LLMs responses.

Almost a year ago, Arize debuted its initial platform in the Google Cloud Marketplace, and now it is augmenting its presence there with these powerful new features for its enterprise customers.

Prompt Playground and new workflows

Arize’s new prompt engineering workflows, including the Prompt Playground, enable teams to uncover poor performing prompt templates, iterate on them in real time, and verify improved LLM outputs before deployment.

Screenshot of Arize AI’s Prompt Playground tool. Credit: Arize AI

Prompt analysis is an important but often overlooked part of troubleshooting an LLM’s performance, which can simply be boosted by testing different prompt templates or iterating on one for better responses.

With these new workflows, teams can easily:

Uncover responses with poor user feedback or evaluation scores
Identify the underlying prompt template associated with poor responses
Iterate on the existing prompt template to improve coverage of edge cases
Compare responses across prompt templates in the Prompt Playground prior to implementation

As Dhinakaran explained, prompt engineering is absolutely key to staying competitive with LLMs in the market today. The company’s new prompt analysis and iteration workflows help teams ensure their prompts cover necessary use cases and potential edge scenarios that may come up with real users.

“You’ve got to make sure that the prompt you’re putting into your model is pretty damn good to stay competitive,” Dhinakaran said. “What we launched helps teams engineer better prompts for better performance. That’s as simple as it is: we help you focus on making sure that that prompt is performant that prompt covers all of these cases that that you need to handle.”

For example, prompts for an education LLM chatbot need to ensure no inappropriate responses, while customer service prompts should cover potential edge cases and nuances around services offered or not offered.

Arize is also providing the industry’s first insights into the private or contextual data that influences LLM outputs – what Dhinakaran called the “secret sauce” companies provide. The company uniquely analyzes embeddings to evaluate the relevance of private data fused into prompts.

“What we rolled out is a way for AI teams to now monitor, look at their prompts, make it better, and then also, specifically understand the private data that’s now being put into those those props, because the private data part makes sense,” Dhinakaran said.

Dhinakaran told VentureBeat that enterprises can deploy its solutions on prem for security reasons, and that they are SOC-2 compliant.

The importance of private organizational data

This enables examination of whether the right context is present in prompts to handle real user queries. Teams can identify areas where they may need to add more content around common questions lacking coverage in the current knowledge base.

“No one else out there is really focusing on troubleshooting this private data, which is really like the secret sauce that companies have to influence the prompt,” she noted.

Arize also launched complementary workflows leveraging search and retrieval to help teams troubleshoot issues stemming from the retrieval component of retrieval augmented generation (RAG) models.

These will empower teams to pinpoint where they may need to add additional context into their knowledge base, identify cases where retrieval failed to surface the most relevant information, and ultimately understand why their LLM may have hallucinated or generated sub-optimal responses.

Understanding context and relevance — and where it is lacking

Dhinakaran gave an example of how Arize looks at query and knowledge base embeddings to uncover irrelevant documents retrieved that may have led to a faulty response.

Screenshot of Arize AI’s embeddings analysis tool. Credit: Arize AI

“You can click on let’s say, a user question in our product and it’ll show you all of the revelant documents that it could have pulled, and which one did it finally pull to actually use in the response,” she described.

“You can see where the model may have hallucinated or provided suboptimal responses based on deficiencies in the knowledge base,” Dhinakaran said.

This end-to-end observability and troubleshooting of prompts, private data, and retrieval is designed to help teams optimize LLMs responsibly after initial deployment, when models invariably struggle to handle real-world variability.

Dhinakaran summarized Arize’s focus: “We’re not just a day one solution, we help you actually ongoing get it to work.”

The company aims to provide the missing monitoring and debugging capabilities for organizations to continuously improve their LLMs post-deployment. This allows them to move past theoretical value to real-world impact across industries.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Originally Appeared Here

Pages

Categories

Arize AI wants to improve enterprise LLMs with ‘Prompt Playground,’ new data analysis tools

Event

Prompt Playground and new workflows

The importance of private organizational data

Understanding context and relevance — and where it is lacking

About the Author:

Event

Prompt Playground and new workflows

The importance of private organizational data

Understanding context and relevance — and where it is lacking

You May Also Like

AI Is Becoming Infrastructure â Education Hasnât Noticed

Scaler School of Technology’s New-Age Engineering Blueprint – ThePrint –

Zepto’s Spellchecker Fixes Errors With a Little Help from LLMs

Using “Prompt Engineering” for Safer AI Mental Health Use

Should AI Be Taught Like Maths? Why Prompt Engineering Might Be the Next Literacy

Are you AI fluent? Here are 4 tips on getting the most out of chatbots

About the Author:

AI Is Becoming Infrastructure â Education Hasnât Noticed