Gemini Live was available by Google at its Made by Google event on Tuesday
This feature enables users to engage in a semi-natural spoken conversation with an AI chatbot powered by Google’s most recent large language model rather than a typed version. TechCrunch was present to conduct this experiment firsthand.
Gemini Live is Google’s response to OpenAI’s Advanced Voice Mode, a feature virtually identical to ChatGPT and currently in a limited alpha test. Although OpenAI had demonstrated the feature before Google, Google was the first to implement the final version.
I have found that these verbal features with minimal latency feel significantly more natural than texting with ChatGPT or conversing with Siri or Alexa. I discovered that Gemini Live could respond to inquiries in less than two seconds and adjust its course rapidly when interrupted. Gemini Live is not flawless; however, it is the most effective method of using a phone hands-free that I have encountered thus far.
The Operation of Gemini Live
The feature allows users to select from 10 accents before engaging with Gemini Live, unlike OpenAI’s three voices. Each creation was a collaborative effort between Google and voice actors. I found each one to sound very humanlike and appreciated the variety.
In one instance, a Google product manager verbally requested that Gemini Live identify family-friendly wineries near Mountain View with outdoor areas and playgrounds. This was done to accommodate the possibility of children accompanying the visit. Gemini successfully recommended a location that satisfied the criteria: Cooper-Garrod Vineyards in Saratoga. This is a considerably more intricate endeavor than I would have asked Siri or Google Search to handle.
Nevertheless, Gemini Live is not without its shortcomings. It appeared to induce hallucinations regarding Henry Elementary School Playground, a playground purportedly “10 minutes away” from the vineyard. Although Saratoga has additional facilities, the nearest Henry Elementary School is over a two-hour drive away. Henry Ford Elementary School is situated in Redwood City but is located 30 minutes away.
Google enjoyed demonstrating the ability of users to interrupt Gemini Live mid-sentence, and the AI will promptly reorient. According to the organization, this enables users to regulate the discourse. This feature is not entirely functional in practice. Occasionally, the AI appeared unable to comprehend the conversation between Gemini Live and Google’s project supervisors while conversing with one another.
According to product manager Leland Rechis, Google prohibits Gemini Live from singing or imitating any dialects beyond the 10 it offers. The organization is probably taking action to prevent potential conflicts with copyright legislation. Additionally, Rechis stated that Google is not currently concerned with enabling Gemini Live to comprehend the emotional intonation in a user’s voice, a feature that OpenAI promoted during its demonstration.
The feature appears to be an excellent method for conducting a more in-depth investigation of a topic than would be possible with a straightforward Google Search. Google acknowledges that Gemini Live is a preliminary step toward Project Astra, the entirely multimodal AI model introduced at Google I/O. Currently, Gemini Live can only be used for voice conversations; however, Google intends to incorporate real-time video comprehension in the future.