After weeks of teasing its new frontier model, Google finally launched Gemini 3 on Tuesday with claims of being the new state of the art tool in the AI world. Google’s Gemini 2.5 Pro model had earlier been widely believed to be the top of the line AI model for most workflows but Elon Musk’s Grok AI briefly overtook the tool in some benchmarks. Gemini now seems to be once again reaching the top of the AI food chain.
How does Gemini 3 compare against other top models?
As per the benchmarks shared by Google, the new Gemini 3 Pro model not only overtakes Gemini 2.5 Pro but also widens the gap with other rivals like ChatGPT and Claude.
On the popular LMArena leaderboard, Gemini 3 Pro is the new top model with a score of 1501 for text related tasks, surpassing the Grok 4.1 Thinking and Grok 4.1 models. Gemini 3 Pro also dethroned GPT-5 in the WebDev leaderboard. LMArena says that Gemini 3 Pro is now the number 1 model in coding, math, creative writing and long queries in nearly all of its leaderboards.
On Humanity’s Last Exam, a benchmark specifically designed to test academic reasoning, Gemini 3 Pro achieved a score of 37.5 percent, placing it well ahead of GPT-5.1 which sat at the number 2 spot with 26.5 percent, and Claude Sonnet 4.5 which trailed at 13.7 percent.
Gemini 3 Pro also showed remarkable performance on MathArena Apex, a benchmark consisting of challenging math contest problems. While Gemini 2.5 Pro, Claude Sonnet 4.5 and GPT-5.1 all scored in the low single digits between 0.5 percent and 1.6 percent, Gemini 3 Pro secured the number 1 spot with a score of 23.4 percent.
The new model also showed improvements in screen understanding and agentic workflows. On ScreenSpot Pro, a benchmark designed to evaluate a model’s ability to understand computer screens, Gemini 3 Pro achieved a score of 72.7 percent, showing complete dominance against Claude Sonnet 4.5 and GPT-5.1 which had scores of 36.2 percent and 3.5 percent respectively.
Gemini 3 Pro still failed to take the lead in coding related tasks in some benchmarks. For instance, on SWE-Bench Verified, Claude Sonnet 4.5 managed to hold the number 1 spot with 77.2 percent, while Gemini 3 Pro came in third with 76.2 percent and GPT-5.1 took the second spot with 76.3 percent.
With AI companies releasing new models at even shorter intervals, it is unlikely that Gemini 3 Pro will remain the category leader for long, but for now the new model does have the lead in most benchmarks. However, do note that benchmarks may not always reveal the full picture about an AI model since many companies could be gaming these benchmarks to get their models listed in higher spots and the actual test of the model can only be through user experience.