On Wednesday, Google introduced PaLM 2, a family of foundational language models comparable to OpenAI’s GPT-4. At its Google I/O event in Mountain View, California, Google revealed that it already uses it to power 25 products, including its Bard conversational AI assistant.
Also Read: Google Bard Goes Global: Chatbot Now Available in Over 180 Countries
Features of PaLM 2
According to Google, PaLM 2 supports over 100 languages and can perform “reasoning,” code generation, and multi-lingual translation. During his 2023 Google I/O keynote, Google CEO Sundar Pichai said it comes in four sizes: Gecko, Otter, Bison, and Unicorn. Gecko is the smallest and can reportedly run on a mobile device. Aside from Bard, it is behind AI features in Docs, Sheets, and Slides.
PaLM 2 vs. GPT-4
All that is fine, but how does PaLM 2 stack up to GPT-4? In the PaLM 2 Technical Report, it appears to beat GPT-4 in some mathematical, translation, and reasoning tasks. But the reality might not match Google’s benchmarks. On a cursory evaluation of the PaLM 2 version of Bard by Ethan Mollick, he finds that its performance appears worse than GPT-4 and Bing on various informal language tests.
Also Read: ChatGPT v/s Google Bard: A Head-to-Head Comparison
PaLM 2 Parameters
The first PaLM was notable for its massive size: 540 billion parameters. Parameters are numerical variables that serve as the learned “knowledge” of the model. Thus, enabling it to make predictions and generate text based on the input it receives. More parameters roughly mean more complexity, but no guarantee they are used efficiently. By comparison, OpenAI’s GPT-3 (from 2020) has 175 billion parameters. OpenAI has never disclosed the number of parameters in GPT-4.
Lack of Transparency
So that leads to the big question: Just how “large” is PaLM 2 in terms of parameter count? Google doesn’t say. This has frustrated some industry experts who often fight for transparency in what makes AI models tick. That’s not the only property of it that Google has been quiet about. The company says PaLM 2 has been trained on “a diverse set of sources: web documents, books, code, mathematics, and conversational data.” But does not go into detail about what exactly that data is.
Concerns About Training Data
The dataset likely includes a wide variety of copyrighted material used without permission and potentially harmful material scraped from the Internet.
Future Developments
And as far as LLMs go, PaLM 2 is far from the end of the story. In the I/O keynote, Pichai mentioned that a newer multimodal AI model called “Gemini” was currently in training.
Learn More: An Introduction to Large Language Models (LLMs)
Our Say
In conclusion, while PaLM 2 may fall short in some areas, it represents an important milestone in developing natural language processing technology. As we move closer to the next generation of language models, it will be fascinating to see how it evolves and matures and whether it can take on OpenAI’s GPT-4.