Summary
- Google Gemini has enhanced capabilities in the pipeline.
- ‘Gemini Live’ may soon enable voice-to-AI interaction with uploaded files and YouTube videos.
- Interaction with user-uploaded files could make summarization and interpretation of the file contents simpler.
Gemini has become much more than a Google Assistant replacement on Android, even though functionality isn’t at par yet. Meanwhile, Google has built out new capabilities for Gemini, including those which make it suitable for Android XR. One of the newer releases from August last year is called Gemini Live, meant to mimic a natural, spoken conversation with the AI. Google could soon turn this experience up to eleven with document upload support.
For context, document upload is already supported for Gemini Advanced subscribers, but it still requires typed-out queries and responses that you read. Once they are analyzed, you can query the AI about key data points in the files, obtain a quick summary, or draw inferences from the information in them. Gemini Live transforms the user experience with voice queries and spoken responses, but it still lacks the ability to interact with user-uploaded files.
That key element could change soon, though. Popular Google app researcher @AssembleDebug on X told Android Authority that the beta version 16.1.38 of the Google app has a whole UI dedicated to document handling. The researcher managed to activate this hidden interface, revealing file upload support and support for getting similar contextual analyses and responses in the conversational format.
Interacting with YouTube will never be the same
Summaries, now read aloud
The interaction starts in Gemini Advanced, where users can upload the files, but once that’s done, users will see a toast message prompting them to switch and “Talk Live about this.” In Live, the AI retains access to the documents and their contents. It should also work with YouTube videos, where you share the link like you would with a friend, and the AI digests its content to spit out an analysis, conclusion, or engage in a full-blown conversation about it.
As always, you can also secure a transcript of your conversation with the AI for reference later, so you don’t need to have the entire conversation again. While this might not seem like a big improvement, it is as close as AI has brought us to literally talking to a digital document. When used even for fun, the casual tone of Gemini Live responses might make them easier to process or remember.
That said, document and YouTube video analysis aren’t available on Gemini Live yet, and we may have to wait for an official announcement of a server-side update to unlock this capability for everyone.