OpenAI Debuts ChatGPT’s Realistic Voice for Paying Users

What's better than OpenAI? Developers shop for alternatives, Telecom News, ET Telecom

On Tuesday, OpenAI began rolling out ChatGPT’s Advanced Voice Mode, offering a select group of ChatGPT Plus users access to GPT-4o’s hyper-realistic audio responses, with a full rollout to all Plus users expected by fall 2024

In May, OpenAI introduced GPT-4o’s voice to the public, and the feature was met with astonishment due to its rapid responses and striking resemblance to the voice of a specific human.

Sky’s voice resembled Scarlett Johansson, the actress who portrayed the artificial assistant in the film “Her.” Johansson declined numerous requests from CEO Sam Altman to use her voice shortly after OpenAI’s demonstration. Subsequently, she retained legal counsel to defend her visage after witnessing GPT-4o’s demonstration.

OpenAI initially denied employing Johansson’s voice; however, it subsequently eliminated the voice from its demonstration. OpenAI announced in June that it would postpone the release of Advanced Voice Mode to enhance its safety protocols.

After one month, the delay has been somewhat alleviated. OpenAI has announced that this alpha will not include the video and screen-sharing capabilities demonstrated during its Spring Update. They will be released at a “later date.”

Currently, the GPT-4o demo that captivated the audience is merely a demonstration. However, certain premium users can now access ChatGPT’s voice feature, as demonstrated in the demo.

ChatGPT can now talk and listen

You may have already experimented with the Voice Mode currently accessible in ChatGPT; however, OpenAI asserts that the Advanced Voice Mode is distinct.

ChatGPT’s previous audio solution comprised three distinct models: one to convert your voice to text, GPT-4 to process your prompt, and a third to convert ChatGPT’s text into voice.

However, GPT-4o is multimodal and can perform these tasks independently, resulting in conversations with a significantly reduced latency. OpenAI also asserts that GPT-4o can detect emotional intonations in your voice, such as singing, excitement, or sorrow.

ChatGPT Plus users can observe the hyper-realistic nature of OpenAI’s Advanced Voice Mode firsthand during this pilot. TechCrunch was unable to evaluate the feature before the publication of this article; however, we will assess it upon obtaining access.

OpenAI has announced that it will progressively introduce ChatGPT’s new voice to monitor its usage closely. Individuals in the alpha group will receive an alert in the ChatGPT application, followed by an email containing instructions on operating it.

In the months since OpenAI’s demonstration, the company has conducted voice tests on GPT-4o with over 100 external red teamers proficient in 45 distinct languages. According to OpenAI, a report regarding these safety initiatives will be released in early August.

The company has announced that Advanced Voice Mode will be restricted to the four preset voices of ChatGPT – Juniper, Breeze, Cove, and Ember – which were developed in collaboration with professional voice actors.

The Sky voice featured in OpenAI’s May demo is no longer accessible in ChatGPT. According to Lindsay McCallum, spokesperson for OpenAI, “ChatGPT is unable to imitate the voices of other individuals and public figures, and it will suppress outputs that diverge from one of these preset voices.”

OpenAI is endeavoring to prevent the emergence of deepfake controversies. In January, the vocal cloning technology of AI startup ElevenLabs was employed to impersonate President Biden, thereby deceiving primary voters in New Hampshire.

Additionally, OpenAI claims that it has implemented new filters to prevent specific requests for the generation of music or other copyrighted audio.

AI companies have been involved in legal proceedings for copyright infringement in the past year, and audio models such as GPT-4o have opened up a new category of companies eligible to submit a complaint.

In particular, record publishers have a litigation history and have already sued AI song generators Suno and Udio.

Tags: ChatGPTOpenAIsky voice

4 months ago

Hillary Ondulohi

Hillary is a media creator with a background in mechanical engineering. He leverages his technical expertise to craft informative pieces on protechbro.com, making complex concepts accessible to a wider audience.