
Google rolls out Deep Think for Gemini app
01 Aug 2025, 10:18 PMGoogle says Gemini 2.5 Deep Think outperforms AI models from OpenAI, xAI, and Anthropic on LiveCodeBench6, a challenging test of competitive coding tasks.
Team Head&Tale
Google is rolling out Deep Think for its Gemini app, for Google AI Ultra subscribers. This advanced AI feature is designed to help users tackle complex problems.
Deep Think pushes the frontier of thinking capabilities by using parallel thinking techniques. This approach lets Gemini generate many ideas at once and consider them simultaneously, even revising or combining different ideas over time, before arriving at the best answer.
"Moreover, by extending the inference time or "thinking time," we give Gemini more time to explore different hypotheses, and arrive at creative solutions to complex problems," Google said in its blogpost.
The tech giant said it also developed "novel" reinforcement learning techniques that encourage the model to make use of these extended reasoning paths, thus enabling Deep Think to "become a better, more intuitive problem-solver over time."
While the original model was slow and took hours to reason about complex math problems, the new version is "faster and more usable day-to-day."
It reaches Bronze-level performance on the 2025 IMO benchmark, based on Google's internal tests.
"Deep Think can help people tackle problems that require creativity, strategic planning and making improvements step-by-step," it added. Deep Think’s performance is also reflected in challenging benchmarks that measure coding, science, knowledge and reasoning capabilities.
Gemini 2.5 Deep Think, according to the company, achieves state-of-the-art performance on Humanity’s Last Exam (HLE), a challenging benchmark that measures expertise in different domains, including science and math. Google claims its model scored 34.8% on HLE (without tools), compared to xAI’s Grok 4, which scored 25.4%, and OpenAI’s o3, which scored 20.3%.
Google says Gemini 2.5 Deep Think outperforms AI models from OpenAI, xAI, and Anthropic on LiveCodeBench6, a challenging test of competitive coding tasks. Google’s model scored 87.6%, whereas Grok 4 scored 79%, and OpenAI’s o3 scored 72%.