What’s the purpose of interacting with a human-like bot if it’s unreliable and lacks personality?
This question has been on my mind since I started testing Gemini Live, Google’s latest attempt to create a more engaging chatbot experience. Designed with realistic voices and the ability to interrupt the bot whenever needed, Gemini Live aims to provide a more natural conversation.
According to Sissie Hsiao, GM for Gemini experiences at Google, Gemini Live is tuned to have an intuitive, back-and-forth conversation. “[It] can provide information more succinctly and answer more conversationally than, for example, if you’re interacting in just text. We think that an AI assistant should be able to solve complex problems … and also feel very natural and fluid when you engage with it,” Hsiao explained back in May.
After spending considerable time using Gemini Live, I can confirm that it indeed feels more natural than previous AI-powered voice interactions, such as Google Assistant. However, it still struggles with the fundamental issues of AI technology, like inconsistencies and hallucinations, while also introducing some new challenges.
The Struggle with Naturalness
Gemini Live is essentially a sophisticated text-to-speech engine built on top of Google’s latest generative AI models, Gemini 1.5 Pro and 1.5 Flash. These models generate text, which is then spoken aloud by the engine. Users can access a transcript of conversations within the Gemini app.
For my experience on a Pixel 8a, I selected the voice “Ursa,” described by Google as “mid-range” and “engaged.” While the voice was more expressive than older synthetic voices from Google, it still maintained a neutral tone, avoiding the uncanny valley but lacking the warmth or variation that would make it truly engaging. Unlike other advanced voice modes, users cannot adjust the pitch, tone, or pace of the voice, limiting personalization.
Gemini Live also lacks some of the more human-like traits found in other voice technologies, such as laughing, breathing, or natural speech hesitations. This makes the interaction feel polite but detached, as if the bot is preoccupied with multiple conversations and can’t fully engage in yours.
Testing Gemini Live
When Google introduced Gemini Live, it was suggested that the feature could be helpful for job interview preparation. Curious, I tested this by simulating a tech journalism interview. The bot asked some generic questions mixed with more personalized ones, and its feedback was overwhelmingly positive. However, when I tested its response by implying that I had only given one-word answers, the bot adjusted its feedback accordingly, showing a troubling tendency to agree with incorrect information.
This pattern of inconsistency was common in my interactions with Gemini Live. The bot often generated confident yet inaccurate responses, making it difficult to trust.
Performance Issues and Limitations
Gemini Live remembers details from previous conversations within the same session but often struggles with factual accuracy, even on topics you’d expect it to handle well. For instance, when I asked for budget-friendly activities in New York City, it recommended some places that were outdated or closed, revealing its tendency to offer outdated information.
Moreover, Gemini Live’s performance was hampered by technical issues. Its voice would sometimes cut out mid-response, and it often failed to recognize my input on the first try, requiring multiple attempts to get a complete answer.
While Google plans to add more capabilities to Gemini Live in future updates, including the ability to interpret images and real-time video, the current version feels incomplete. Without the expressiveness and reliability of other advanced voice modes, there’s little reason to choose Gemini Live over Google’s text-based AI experiences. At this stage, it feels like a prototype that needs further refinement before it can truly shine.