Bringing Bender to Life in the Real World with ChatGPT

Futurama is chock-full of memorable characters, but none (with the possible exception of Fry) are as well-known and beloved as Bender Bending Rodriguez. Even people who have never seen the show can recognize Benderâs distinct voice, as performed by John DiMaggio. And while he is certainly crude and sarcastic, Bender is the type of pal that many people would like to have around. Thanks to a lot of fantastic work by Manuel Ahumada, that doesnât have to be a fantasy anymore.

Ahumada built a real-life robotic chum with the head, face, eyes, and even voice of Futuramaâs Bender. It uses AI to determine what a user says (or does, visually), another AI to craft a response, and yet another AI to turn that response into actual spoken words. That voice isnât a perfect match for Bender, but it is pretty darn close. Thatâs probably a good thing for DiMaggioâs sake, as he reportedly had to negotiate hard with Disney for the eighth season of Futurama. Not only does the voice sound right, but the responses feel right, like things Bender would really say.

This requires a chain of processing that starts with speech-to-text transcription. That provides a prompt, which goes to OpenAIâs ChatGPT. It tries to understand the context and generates a text response with the kind of language that Bender would use â expect it to be vaguely insulting (or blatantly insulting). Finally, that text response goes to Eleven Labs, which performs very sophisticated text-to-speech generation. This is key, because Eleven Labs allows for voice models trained on specific voices, such as John DiMaggio doing Bender. It works quite well, which is pretty terrifying.

A Raspberry Pi 5 single-board computer sits inside Benderâs 3D-printed head and performs the local processing. It also sends prompts to ChatGPT and Eleven Labs, and then receives the results. To expand Benderâs capabilities, Ahumada stuck a camera on his forehead. That allows for facial recognition and for other interesting functions. For example, the user can ask Bender to describe what heâs looking at. That follows a similar routine as spoken interaction, just with a photo instead of a transcription.

The facial recognition enables Benderâs animatronic features. It has 3D-printed eyes (based on a design by Will Cogley), can rotate its entire head, and has an animated mouth made from an LED matrix panel. If a person is in view, Bender can turn to look at them.

As amazing as this is, it isnât perfect. The big problem is speed, as it can take Bender a pretty long time to respond. That is mostly due to the time it takes Eleven Labs to generate an audio clip, so Ahumada used a clever tactic to hide the wait. Soon after the user speaks, Bender will play a pre-generated message indicating that heâs thinking. That fills some of the wait time and lets the user know that everything is working.

Even with the delay, this is very impressive and proves how far AI has come.

Originally Appeared Here

Pages

Categories

Bringing Bender to Life in the Real World with ChatGPT

About the Author:

You May Also Like

What Microsoft’s Bing Portal Controversy Says About MLS Value

Google Gemini helps a robot navigate an office

How custom ChatGPT tools are the new face of Kenyan civic education

New AI Chip In Making Claims It Will Revolutionise ChatGPT

How to disable Copilot in Windows 11

Google Gemini To Now Offer Cross-Verification Of AI-Generated Content Using Search Feature

About the Author: