At Google I/O, AI that never hallucinates mistakes

This yr, Google I/O 2025 had one focus: Synthetic intelligence.

We have already lined all the greatest information to come back out of the annual builders convention: a new AI video generation tool called Flow. A $250 AI Ultra subscription plan. Tons of new changes to Gemini. A virtual shopping try-on feature. And critically, the launch of the search software AI Mode to all users in the US.

But over almost two hours of Google leaders speaking about AI, one phrase we did not hear was “hallucination”.

Hallucinations stay one of the cussed and concerning problems with AI models. The time period refers to invented details and inaccuracies that large-language fashions “hallucinate” of their replies. And in line with the large AI manufacturers’ personal metrics, hallucinations are getting worse — with some fashions hallucinating greater than 40 % of the time.

However when you have been watching Google I/O 2025, you would not know this drawback existed. You’d assume fashions like Gemini by no means hallucinate; you will surely be stunned to see the warning appended to each Google AI Overview. (“AI responses could embrace errors”.)

Mashable Gentle Pace

The closest Google got here to acknowledging the hallucination drawback got here throughout a section of the presentation on AI Mode and Gemini’s Deep Search capabilities. The mannequin would verify its personal work earlier than delivering a solution, we have been informed — however with out extra element on this course of, it sounds extra just like the blind main the blind than real fact-checking.

For AI skeptics, the diploma of confidence Silicon Valley has in these instruments appears divorced from precise outcomes. Actual customers discover when AI instruments fail at easy duties like counting, spellchecking, or answering questions like “Will water freeze at 27 degrees Fahrenheit?“

Google was wanting to remind viewers that its latest AI mannequin, Gemini 2.5 Professional, sits atop many AI leaderboards. However on the subject of truthfulness and the power to reply easy questions, AI chatbots are graded on a curve.

Gemini 2.5 Professional is Google’s most clever AI mannequin (in line with Google), but it scores just a 52.9 percent on the Performance SimpleQA benchmarking check. In accordance with an OpenAI research paper, the SimpleQA check is “a benchmark that evaluates the power of language fashions to reply brief, fact-seeking questions.” (Emphasis ours.)

A Google consultant declined to debate the SimpleQA benchmark, or hallucinations typically — however did level us to Google’s official Explainer on AI Mode and AI Overviews. Here is what it has to say:

[AI Mode] makes use of a big language mannequin to assist reply queries and it’s potential that, in uncommon circumstances, it might generally confidently current info that’s inaccurate, which is often generally known as ‘hallucination.’ As with AI Overviews, in some circumstances this experiment could misread internet content material or miss context, as can occur with any automated system in Search…

We’re additionally utilizing novel approaches with the mannequin’s reasoning capabilities to enhance factuality. For instance, in collaboration with Google DeepMind analysis groups, we use agentic reinforcement studying (RL) in our customized coaching to reward the mannequin to generate statements it is aware of usually tend to be correct (not hallucinated) and in addition backed up by inputs.

Is Google unsuitable to be optimistic? Hallucinations could but show to be a solvable drawback, in spite of everything. However it appears more and more clear from the analysis that hallucinations from LLMs are usually not a solvable drawback proper now.

That hasn’t stopped corporations like Google and OpenAI from sprinting forward into the era of AI Search — and that is prone to be an error-filled period, until we are the ones hallucinating.

Subjects
Artificial Intelligence
Google Gemini

Trending Merchandise

Add to compare