- AI Breakfast
- Posts
- Google's Gemini 2.5 Tops Leaderboards
Google's Gemini 2.5 Tops Leaderboards
Good morning. It’s Wednesday, March 26th.
On this day in tech history: In 1976, the first 8-inch floppy disk drive, the Shugart SA-801, was introduced by Shugart Associates.
In today’s email:
Google’s Gemini 2.5 Tops Leaderboards
OpenAI’s New Image Generator Solves Text In Images
4 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories
Google’s Gemini 2.5 Tops Leaderboards, Supports 1M Token Inputs
Google has introduced Gemini 2.5, its most advanced reasoning model, pushing the boundaries of AI-driven problem-solving in math, coding, and multimodal analysis. The first in this new model family, Gemini 2.5 Pro Experimental, is now available in Google AI Studio and the Gemini Advanced subscription. Equipped with a 1 million-token context window—soon expanding to 2 million—the model handles vast datasets, technical documents, and full code repositories with improved reasoning capabilities. It leads in key benchmarks, scoring 68.6% on Aider Polyglot for code editing and 18.8% on Humanity’s Last Exam, though it lags behind Claude 3.7 Sonnet on SWE-bench Verified for AI-assisted software development.
🚨 Gemini 2.5 Pro Exp dropped and it's now #1 across SEAL leaderboards:
🥇 Humanity’s Last Exam
🥇 VISTA (multimodal)
🥇 (tie) Tool Use
🥇 (tie) MultiChallenge (multi-turn)
🥉 (tie) Enigma (puzzles)Congrats to @demishassabis@sundarpichai & team!
🔗 scale.com/leaderboard
— Alexandr Wang (@alexandr_wang)
5:45 PM • Mar 25, 2025
At the same time, Google is rolling out real-time vision to Gemini Live, letting users point their phone cameras at objects or share on-screen content for instant AI analysis. Read more.
OpenAI’s New Image Generator Solves The ‘Text Problem’
OpenAI has integrated GPT-4o’s native image generation into ChatGPT, making it the default across free and paid tiers. Unlike previous DALL-E implementations, GPT-4o processes text and images together, improving spatial accuracy and object consistency. The model can handle up to 20 objects at once while maintaining spatial relationships, making it more precise in rendering text and complex scenes.
4o image generation has arrived.
It's beginning to roll out today in ChatGPT and Sora to all Plus, Pro, Team, and Free users.
— OpenAI (@OpenAI)
6:34 PM • Mar 25, 2025
Users can refine images through conversation, leveraging in-context learning to iteratively improve results. While the system offers greater creative flexibility than DALL-E 3, OpenAI still enforces restrictions on explicit content, deepfakes, and unauthorized likenesses. All generated images include C2PA metadata for transparency.
OpenAI has also refined ChatGPT’s Advanced Voice Mode, making conversations smoother by reducing interruptions. Free users now experience more natural dialogue, while paying subscribers gain enhanced voice interactions.
On the leadership front, OpenAI is undergoing restructuring, with CEO Sam Altman stepping back from daily operations to focus on research and product strategy. COO Brad Lightcap will now oversee operations, partnerships, and international growth.
However, OpenAI’s models are struggling with a new test of artificial general intelligence. The Arc Prize Foundation’s ARC-AGI-2 benchmark, designed to assess adaptive reasoning, has exposed significant gaps. OpenAI’s o1-pro and DeepSeek’s R1 barely scored above 1%, while human test groups averaged 60%. Even o3 (low), which previously dominated ARC-AGI-1, now manages just 4% accuracy—despite a staggering $200 per task compute cost.
The Arc Prize Foundation has launched a contest challenging developers to hit 85% accuracy on ARC-AGI-2 for just $0.42 per task, marking a new frontier in AI’s pursuit of true reasoning capabilities. Read more.

New AI model from Reve challenges Midjourney and Google’s Imagen, sets new benchmark
Alibaba's Qwen2.5-VL-32B matches larger models with just 32B parameters
ByteDance's InfiniteYou lets users generate unlimited variations of portrait photos
DeepSeek-V3-0324 hits 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
AI-generated writing just leveled up as Midjourney reprograms originality
Apple's missteps highlight risks of AI producing automated headlines, researcher says
AI chip startup FuriosaAI reportedly turns down $800M acquisition offer from Meta
Boston Dynamics plans to use NVIDIA's Isaac GR00T to build AI capabilities for Atlas
MIT's artificial muscles for soft robots flex like a human iris
Microsoft announces security AI agents to help overwhelmed humans
Engineers develop hybrid robot that balances strength and flexibility—and can screw in a lightbulb
Agentic AI is changing online meeting platforms: Moving from silent observer to active participant
Character.ai can now tell parents which bots their kid is talking to
Open Source devs say AI crawlers dominate traffic, forcing blocks on entire countries
China firm plans 5,000-strong humanoid robot army to rival Elon Musk’s Optimus
Quora’s Poe launches its most affordable subscription plan for $5/month

4 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.


Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!