AI Made Friendly HERE

Bringing Bender to Life in the Real World with ChatGPT

Futurama is chock-full of memorable characters, but none (with the possible exception of Fry) are as well-known and beloved as Bender Bending Rodriguez. Even people who have never seen the show can recognize Bender’s distinct voice, as performed by John DiMaggio. And while he is certainly crude and sarcastic, Bender is the type of pal that many people would like to have around. Thanks to a lot of fantastic work by Manuel Ahumada, that doesn’t have to be a fantasy anymore.

Ahumada built a real-life robotic chum with the head, face, eyes, and even voice of Futurama’s Bender. It uses AI to determine what a user says (or does, visually), another AI to craft a response, and yet another AI to turn that response into actual spoken words. That voice isn’t a perfect match for Bender, but it is pretty darn close. That’s probably a good thing for DiMaggio’s sake, as he reportedly had to negotiate hard with Disney for the eighth season of Futurama. Not only does the voice sound right, but the responses feel right, like things Bender would really say.

This requires a chain of processing that starts with speech-to-text transcription. That provides a prompt, which goes to OpenAI’s ChatGPT. It tries to understand the context and generates a text response with the kind of language that Bender would use — expect it to be vaguely insulting (or blatantly insulting). Finally, that text response goes to Eleven Labs, which performs very sophisticated text-to-speech generation. This is key, because Eleven Labs allows for voice models trained on specific voices, such as John DiMaggio doing Bender. It works quite well, which is pretty terrifying.

A Raspberry Pi 5 single-board computer sits inside Bender’s 3D-printed head and performs the local processing. It also sends prompts to ChatGPT and Eleven Labs, and then receives the results. To expand Bender’s capabilities, Ahumada stuck a camera on his forehead. That allows for facial recognition and for other interesting functions. For example, the user can ask Bender to describe what he’s looking at. That follows a similar routine as spoken interaction, just with a photo instead of a transcription.

The facial recognition enables Bender’s animatronic features. It has 3D-printed eyes (based on a design by Will Cogley), can rotate its entire head, and has an animated mouth made from an LED matrix panel. If a person is in view, Bender can turn to look at them.

As amazing as this is, it isn’t perfect. The big problem is speed, as it can take Bender a pretty long time to respond. That is mostly due to the time it takes Eleven Labs to generate an audio clip, so Ahumada used a clever tactic to hide the wait. Soon after the user speaks, Bender will play a pre-generated message indicating that he’s thinking. That fills some of the wait time and lets the user know that everything is working.

Even with the delay, this is very impressive and proves how far AI has come.

Originally Appeared Here

You May Also Like

About the Author:

Early Bird