Table of Contents
Gemini AI Models: Gemini 1.5 Pro and 1.5 Flash, two of Google’s most prominent generative AI models, have received a lot of praise for their capacity to handle and analyze massive volumes of data, including summarizing long documents and searching scenes in film footage. New evidence, however, indicates that these models fail to deliver as promised.
Gemini AI Models Struggling with Long Contexts
According to two research that looked at how well Google’s Gemini models deal with big datasets, 1.5 Pro and 1.5 Flash frequently have trouble providing accurate answers regarding lengthy texts. Only 40% to 50% of the time did the models get the questions right in the document-based tests. “While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” said Marzena Karpinska, a postdoc at UMass Amherst.
Context Window Limitations
The amount of input data that a model can consider before generating an output is called its context window. According to Google, the latest Gemini versions can manage 2 million tokens, which is about 1.4 million words, two hours of video, or 22 hours of audio. The models’ capacity to reason and provide answers across such vast contexts is, however, constrained, as demonstrated by practical tests.
Research Findings
In one study, researchers tested the models with true/false statements about recent fiction books. Gemini 1.5 Pro answered correctly only 46.7% of the time, while Flash managed just 20%. These results suggest that the models are less effective than a random guess at understanding long texts.
A second study focused on the models’ ability to reason over videos. The researchers created a dataset of images and questions, finding that Flash struggled significantly. In one test, Flash correctly transcribed only 50% of handwritten digits from a series of images, with accuracy dropping to 30% for more digits.
Overpromising Capabilities
Both studies indicate that Google may have overpromised the capabilities of its Gemini models. While other models, like OpenAI’s GPT-4 and Anthropic’s Claude 3.5, also performed poorly, Google has heavily marketed the context window as a key feature.
Industry Implications
Generative AI is under increasing scrutiny as businesses and investors grow frustrated with the technology’s limitations. Surveys by Boston Consulting Group found that half of the C-suite executives surveyed do not expect substantial productivity gains from generative AI and worry about potential mistakes and data compromises.
Calls for Better Benchmarks
Researchers like Karpinska and Michael Saxon of UC Santa Barbara argue that better benchmarks and third-party critiques are needed to validate claims about generative AI capabilities. Current benchmarks, such as “needle in the haystack,” often cited by Google, only measure information retrieval and not the ability to answer complex questions.
- T20 World Cup 2024: Record-Breaking Prize Fund for ICC Men’s T20 World Cup 2024
- New Studies Challenge Google’s Claims on Gemini AI Models’ Capabilities
- Lumber Market Faces Dramatic Downturn After 2021-2022 Bubble
- Corporate America Adopts Cautious Approach to Pride Month Amid Election Year
- Golden State Warriors Attempted Multiple Trades for Paul George Before Free Agency
- Is it Real or Fake? how to detect a fake money
- How To Know If Cartier Sunglasses Are Real in 2024 Read free with Images
- Protect Yourself: 7 Common WhatsApp Scams and How to Stay Safe
- Hot to Safeguarding Your Finances on Cash App: Tips to Avoid Scams
- How to get IRS 4th Stimulus check: Check Your Elegibility and Documents Apply Now
- How to Detect Real vs Fake Diamond Chain with Screenshots
- How to Detect Fake AirPods Pro 2 with Screenshots
- How To Check if iPhone 15 Pro Max is ORIGINAL or FAKE with Screenshots
- How can I tell if my S Pen is a real Samsung S Pen and not a fake?
- Released in December 2023, the Air Jordan 11 Gratitude
- An Agreement to Include Generative AI in iPhones Is Being Discussed Between Apple and Google
- Keep quiet! Please remain silent when receiving the Mega Millions jackpot. Now, complete this.
- Traders in Bitcoin, Ethereum, and XRP are bracing themselves for a shock from the Federal Reserve as the price of cryptocurrencies suddenly accelerates to $300 billion.
- Looking for a job in the US? Move to These Cities where Employment is Booming
- Samsung Ballie, its Pokemon Style home robot, Announced with a few upgrades at CES 2024