Google's Gemini 2.5 Tops Leaderboards

Good morning. It’s Wednesday, March 26th.

On this day in tech history: In 1976, the first 8-inch floppy disk drive, the Shugart SA-801, was introduced by Shugart Associates.

In today’s email:

  • Google’s Gemini 2.5 Tops Leaderboards

  • OpenAI’s New Image Generator Solves Text In Images

  • 4 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories

Google’s Gemini 2.5 Tops Leaderboards, Supports 1M Token Inputs

Google has introduced Gemini 2.5, its most advanced reasoning model, pushing the boundaries of AI-driven problem-solving in math, coding, and multimodal analysis. The first in this new model family, Gemini 2.5 Pro Experimental, is now available in Google AI Studio and the Gemini Advanced subscription. Equipped with a 1 million-token context window—soon expanding to 2 million—the model handles vast datasets, technical documents, and full code repositories with improved reasoning capabilities. It leads in key benchmarks, scoring 68.6% on Aider Polyglot for code editing and 18.8% on Humanity’s Last Exam, though it lags behind Claude 3.7 Sonnet on SWE-bench Verified for AI-assisted software development.

At the same time, Google is rolling out real-time vision to Gemini Live, letting users point their phone cameras at objects or share on-screen content for instant AI analysis. Read more.

OpenAI’s New Image Generator Solves The ‘Text Problem’

OpenAI has integrated GPT-4o’s native image generation into ChatGPT, making it the default across free and paid tiers. Unlike previous DALL-E implementations, GPT-4o processes text and images together, improving spatial accuracy and object consistency. The model can handle up to 20 objects at once while maintaining spatial relationships, making it more precise in rendering text and complex scenes.

Users can refine images through conversation, leveraging in-context learning to iteratively improve results. While the system offers greater creative flexibility than DALL-E 3, OpenAI still enforces restrictions on explicit content, deepfakes, and unauthorized likenesses. All generated images include C2PA metadata for transparency.

OpenAI has also refined ChatGPT’s Advanced Voice Mode, making conversations smoother by reducing interruptions. Free users now experience more natural dialogue, while paying subscribers gain enhanced voice interactions.

On the leadership front, OpenAI is undergoing restructuring, with CEO Sam Altman stepping back from daily operations to focus on research and product strategy. COO Brad Lightcap will now oversee operations, partnerships, and international growth.

However, OpenAI’s models are struggling with a new test of artificial general intelligence. The Arc Prize Foundation’s ARC-AGI-2 benchmark, designed to assess adaptive reasoning, has exposed significant gaps. OpenAI’s o1-pro and DeepSeek’s R1 barely scored above 1%, while human test groups averaged 60%. Even o3 (low), which previously dominated ARC-AGI-1, now manages just 4% accuracy—despite a staggering $200 per task compute cost.

The Arc Prize Foundation has launched a contest challenging developers to hit 85% accuracy on ARC-AGI-2 for just $0.42 per task, marking a new frontier in AI’s pursuit of true reasoning capabilities. Read more.

4 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!