Summary

  • Gemini 1.5 has a million-token context window, 30x more than the free Gemini model, showing significant progress in AI technology.
  • Long-context understanding is the highlight of Gemini 1.5, with capabilities to reason across various file types within its context window.
  • Google’s integration of Gemini into developer applications signifies the rapid progression of AI technology, focusing on the professional market.



When the public gained access to Google’s Gemini 1.0 Pro language model in February, the company was already at work getting Gemini 1.5 ready for release. The new model is currently being rolled out to developers and Google partners, who can request access by joining a screened waitlist. Fortunately, Google has provided quite a bit of public information about what to expect from 1.5.


Bottom line: Gemini 1.5 doubles down on the professional market with the introduction of some impressive capabilities that highlight just how quickly AI is progressing. A public release date has not yet been announced.

When we reached out to Google for a statement, a representative pointed us to Google’s blog post where Alphabet CEO Sundar Pichai notes that Gemini 1.5 “shows dramatic improvements across a number of dimensions and … achieves comparable quality to 1.0 Ultra, while using less compute.” That sounds pretty good, but 1.5’s long-context understanding is really the headliner.

Google's Gemini 1.5 Pro waitlist page
Source: Google




Long-context understanding will increase processing continuity

It’s all about context

A big leap ahead in context understanding is the most noteworthy advancement Gemini 1.5 Pro brings. The amount of information a large language model like Gemini can work with in a single interaction is expressed in tokens. While the Gemini 1.0 Pro model, which powers the current free version of Gemini, has a limit of about 32,000 input tokens per interaction, Gemini 1.5 Pro can manage about a million.

With a million-token context window, Gemini 1.5 dramatically surpasses many other consumer AI models. It trounces the current leader, Claude, by a factor of 5x, and surpasses the current free Gemini model by around 30x. As I touched on above, Google says that Gemini 1.5 Pro will have performance similar to the Gemini 1.0 Ultra model that powers Google’s premium Gemini Advanced, but while running more efficiently.

A graph comparing context windows in AI models
Source: Google



For clarity, context understanding refers to the amount of information a language model can process with continuity. Context understanding is measured in context windows, which are made up of tokens. In turn, tokens are comprised of words, images, video, audio, or code. Gemini 1.5 can reason across different file types within its context window, allowing users to upload videos, text, and even code respositories for analysis.

Gemini 1.5 can process over 700,000 words of text or one full hour of video, for example. Although that still might fall short for the heaviest video applications, you can see how fast things are going to progress with a 10 million-token limit already on the horizon.

Lower part of a phone showing Google's Gemini prompt on Android



Specialized neural networks have a big role to play

Google has been working on Mixture-of-Experts (MoE) architecture for several years, but 1.5 is the first Gemini model to make use of the technology. MoE means that Gemini routes requests to smaller, specialized neural networks to improve speed and response quality. This is not coincidental, as the MoE architecture will be especially useful in helping to efficiently process long-context windows.


The stunning velocity of AI

Tech’s moving very fast here

It’s a safe bet that Google is going to focus on driving the integration of Gemini into third-party developer applications. This segment is clearly being courted with the pre-release of Gemini 1.5 within the company’s new Google AI Studio, which includes a suite of AI developer tools.

Gemini 1.5 is a marker of how quickly AI technology is progressing. Its focus on long-context understanding and cross-modality reasoning is a strong play for the pro market. As Google further integrates Gemini into developer ecosystems, it seems we can expect to see a new generation of information-driven applications start to crop up soon.

Related

What do you actually get with Gemini Advanced?

Ultra, Pro, Advanced, 1.0, 1.5 — what does it all mean?