OpenAI faced a turbulent week with executive exits and major fundraising but is now focused on rallying developers to create tools with its AI models at the 2024 DevDay
The company also announced a public beta of its “Realtime API” on Tuesday. This API is designed to facilitate the development of applications that feature AI-generated voice responses with low latency. Although it is not quite ChatGPT’s Advanced Voice Mode, it is a close second.
Kevin Weil, the chief product officer of OpenAI, stated in a press briefing before the event that the recent departures of chief technology officer Mira Murati and chief research officer Bob McGrew would not impede the company’s advancement.
“I will commence by stating that Bob and Mira have been exceptional leaders.” Weil stated, “I have acquired a wealth of knowledge from them, and they have played a significant role in our current position.” “In addition, we will not decelerate.”
OpenAI is reorganizing its C-suite, which serves as a reminder of the turmoil following last year’s DevDay. The company is endeavoring to persuade developers that it remains the most optimal platform for developing AI applications.
Even though OpenAI is operating in an increasingly competitive space, its leaders claim that the startup has over 3 million developers who are building with its AI models.
In the past two years, OpenAI has reduced the cost of accessing its API for developers by 99%. However, this reduction was probably necessitated by competitors like Meta and Google, who have consistently undercut their prices.
The Realtime API, one of OpenAI’s newest features, will enable developers to create speech-to-speech experiences in their applications that are virtually realtime. Developers will have the option of utilizing six voices provided by OpenAI.
To prevent copyright issues, developers are prohibited from utilizing third-party voices, as these voices are distinct from those available for ChatGPT. (The voice is ambiguous based on Scarlett Johansson’s and is unavailable in any location.)
OpenAI’s director of developer experience, Romain Huet, presented a demonstration of a trip planning application developed using tRealtimeime API during the briefing.
The application enabled users to communicate verbally with an AI assistant regarding an impending excursion to London, and they received responses with minimal latency.
The app was able to annotate a map with restaurant locations as it answered, as tRealtimeime API has access to various tools.
Huet demonstrated trealtimeime API’s ability to communicate with a human over the phone to query about ordering food for an event at additional points.
Unlike Google’s notorious Duo, OpenAI’s API cannot directly contact restaurants or stores. Nevertheless, it can integrate with calling APIs such as Twilio.
It is important to note that OpenAI is not incorporating disclosures to ensure that its AI models automatically identify themselves during calls like this, even though the AI-generated voices sound genuine.
Currently, lopers are responsible for incorporating this disclosure, a requirement that a recent California law may mandate.
OpenAI also announced vision fine-tuning in its API as part of its DevDay announcements.
This feature will enable developers to fine-tune their GPT-4o applications using images and text. In principle, this should assist developers in enhancing the performance of GPT-4o for tasks that require visual comprehension.
Olivier Godement, OpenAI’s director of product API, informs TechCrunch that developers will be unable to upload copyrighted imagery (such as a photograph of Donald Duck), images that depict violence, or any other imagery that contravenes OpenAI’s safety policies.
OpenAI is in a race to rival its competitors’ offerings in the AI model licensing sector.
Its prompt caching feature is comparable to the feature that Anthropic introduced several months ago, which enables developers to cache frequently used context between API calls, thereby reducing costs and improving latency.
OpenAI says developers can save 50% by utilizing this feature, while Anthropic guarantees a 90% discount.
Finally, OpenAI provides a model distillation feature that enables developers to fine-tune smaller models, such as GPT-4o mini, by utilizing larger AI models, such as o1-preview and GPT-4o.
This feature should enable developers to enhance the performance of those small AI models even though operating smaller models typically results in cost savings compared to running larger ones.
OpenAI is introducing a beta evaluation tool as part of model distillation. This utility will enable developers to assess the performance of their fine-tuning within OpenAI’s API.
For example, the absence of any information regarding the GPT Store during the previous year’s DevDay may generate greater interest. OpenAI has been conducting a revenue share program with several of the most prominent creators of GPTs, but there has been little news since then.
Additionally, OpenAI has announced that it will not be publishing any new AI models during this year’s DevDay. Developers anticipating the release of OpenAI o1 (not the preview or mini version) or the startup’s video generation model, Sora, will be required to endure an additional period of anticipation.