OpenAI unveiled its GPT-4o model during its Spring Update event earlier this month and with the addition of the live voice functionality it garnered a lot of hype — including from me. I’ve finally seen a live, in-person demo, and if anything I think it was underhyped.
An hour before I was due to go on stage to moderate a panel on AI co-workers at VivaTech, a European technology conference in Paris, OpenAI’s head of developer experience Romain Huet demonstrated all the new functionality.
During the demo, Huet used the ChatGPT Desktop app to have the AI address the 400+ capacity audience. He even had it do so more enthusiastically and in French. The accent was like an American speaking French but he said “We’re working on making it more French.”
It seems like we’ll have to wait a few months before we all get access to these new capabilities as OpenAI put them through further security testing, but when they arrive this is going to change the way we interact with technology forever. Especially as it will also be in Windows Copilot.
ChatGPT Voice can also watch you
One of the most impressive moments came when Huet opened the camera module in the (coming in the next few months) ChatGPT Voice section of the desktop app.
He gave it a sketch he’d drawn showing the Eiffel Tower and Arc de Triumph, just a rough, drawn-on-a-piece-of-paper outline. ChatGPT identified both from the sketch.
Huet then showed ChatGPT a map and asked how to get to the places in his sketch from our location at Port de Versailles. It was able to give a detailed train route with stops and changes.
He had planned to show the features on an iPhone using the ChatGPT app but had to show it on the laptop due to technical difficulties at the venue. This meant though that he could do an ad-hoc demo of coding using ChatGPT — he is the developer experience guy after all.
Sharing his screen with the AI, he was able to have ChatGPT view the code he was writing, identify its function and suggest improvements. He could then show it the output and ask it for ways to change the code to make it look or work differently — all in real-time.
A demonstration of Sora and Voice Engine
We’re future-proof nothing to lose…📚 at #VivaTech with Lisa Heneghan @LHeneghanCIOA, Global Chief Digital Officer @KPMG, & @JulieRanty, Co-Founder @hey_pollen, as they share strategies for continuous learning & career adaptation, leveraging AI for new opportunities. pic.twitter.com/j7BCl7LDlMMay 22, 2024
OpenAI seems to be entering “product mode” at the moment. While it still describes itself as a research lab with a focus on building artificial general intelligence, it is also stepping up its product game. The ChatGPT Desktop app is close to becoming a vital productivity tool.
The potential for the creation of deep fakes and misleading content using these tools is very real so I understand the reticence, but similar tech already exists so hopefully it will be released soon.
During the demo in Paris Huet also showed a new Sora video, made for the OpenAI developer event in Paris the previous day and showed a multishot tour of the city. As a Sora video takes about 15 minutes to generate this was the only part pre-made from the whole demo.
I was only able to watch this from backstage on a small screen, so didn’t get video but all eyes in the green room turned to that screen as the demonstration happened.
He gave the Sora video to ChatGPT and had it summarize the contents and write a voice-over script for the video. This is where we got to see another hinted-at OpenAI product in action — Voice Engine. This has been kept for internal use only due to safety concerns.
(Image credit: Future)
Huet was able to record (in real-time) a 20-second sample of his voice, have Voice Engine clone it and create a perfect copy. This was then applied to the Sora video to create a promo video. It went further though, as he was able to quickly change the language from English to French to Japanese at the click of a button.
Sora and Voice Engine aren’t publicly available as they are “working on ways to release it safely”.
The potential for the creation of deep fakes and misleading content using these tools is very real so I understand the reticence, but similar tech already exists so hopefully it will be released soon.
More from Tom’s Guide
Back to MacBook Air
SORT BYPrice (low to high)Price (high to low)Product Name (A to Z)Product Name (Z to A)Retailer name (A to Z)Retailer name (Z to A)