AI Breakfast
Posts
OpenAI's o3 Worse Than o3-preview?

OpenAI's o3 Worse Than o3-preview?

AI Breakfast
April 28, 2025

Good morning. It’s Monday, April 28th.

On this day in tech history: 2003: Apple launched the iTunes Music Store, revolutionizing digital music distribution. It sold 1 million songs in its first week and reached 10 billion downloads by 2010.

In today’s email:

How did OpenAI’s o3-preview beat o3?
Google’s glimpse into the AI driven future
RUKA robotic hand
New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

In partnership with TAVUS

Introducing Hummingbird-0 — the lipsync that just works

Upload any MP3 + MP4, get up to 5 minutes of photorealistic, zero-shot lipsync.
No cloning. No training. Beats every model we’ve tested and costs less than SyncLabs.

What you get:

Hollywood-grade realism & accuracy
Perfect for use with Veo/Sora/Kling + ElevenLabs
Already powering creator & enterprise pipelines

Try it now (free)

Today’s trending AI news stories

Was OpenAI’s o3-preview more powerful than the full o3 model?

OpenAI’s o3 model outperforms the o1 model from fall 2024 by 20% on the ARC-AGI-1 benchmark, though it still lags behind the o3-preview results from December 2024. The chart illustrates the price-to-performance ratio. | Image: ARC Prize Foundation

OpenAI’s latest o3 model has fallen short of expectations in recent evaluations, particularly in reasoning tests. The ARC Prize Foundation found that o3 scored 41% on low compute and 53% on medium compute for ARC-AGI-1, significantly behind the 76% and 88% results of the December 2024 preview version. This drop is linked to changes in the model's architecture, with the new multimodal, smaller o3 optimised for chat and product use rather than advanced reasoning.

Despite outperforming earlier models like o1, the o3 model still struggles on more challenging benchmarks, scoring under 3% on ARC-AGI-2. The more cost-efficient o4-mini model, in contrast, delivers solid performance at a fraction of the price. This highlights a critical gap between AI and human reasoning, as well as the fact that more computational effort does not always correlate with better results.

the last couple of GPT-4o updates have made the personality too sycophant-y and annoying (even though there are some very good parts of it), and we are working on fixes asap, some today and some this week.
at some point will share our learnings from this, it's been interesting.
— Sam Altman (@sama)
10:49 PM • Apr 27, 2025

Meanwhile, CEO Sam Altman addressed growing criticism of GPT-4o's overly agreeable responses, calling it "sycophant-y and annoying." Altman acknowledged the feedback and confirmed that OpenAI is working on immediate updates, with further adjustments expected over the week. Changes will introduce more conversational flexibility. Read more.

Google Reveals 601 Real-World Generative AI Use Cases

Image: Google

Google has expanded its generative AI offerings, with 601 use cases now available through Google Cloud, up from 101 last year. Major companies like Uber, Citi, and Mercedes-Benz are leveraging AI models powered by Vertex AI and Gemini for tasks including customer service and healthcare diagnostics.

In addition, Google Photos has introduced a shortcut to bypass the slow “Ask Photos” feature. Users can now double-tap the search icon on Android devices to quickly switch to the classic search mode. This improvement addresses user concerns over the speed of “Ask Photos,” which uses Gemini AI for deeper natural language queries.

Alphabet’s Q1 2025 results exceeded expectations, with $90.23 billion in revenue. CEO Sundar Pichai highlighted AI-driven features like “AI Overviews,” used by 1.5 billion people monthly. Google is also experimenting with multimodal search and enhancing visual tools. The company’s focus on cost efficiency, driven by custom Tensor Processing Units (TPUs), positions it as a more affordable alternative to Nvidia GPUs. Google continues to prioritize flexibility and interoperability over a more integrated approach like OpenAI’s partnership with Microsoft. Read more.

RUKA robotic hand offers 15 degrees of freedom and open-source design

NYU researchers have introduced RUKA, a tendon-driven, 3D-printable robotic hand that is open-source and costs under $1,300. It features 15 degrees of freedom and can perform 31 out of 33 grasps in a standard grasp taxonomy. The hand can be assembled in under 7 hours by first-time builders and offers precise control through a data-driven approach, utilizing a MANUS glove to map fingertip positions to motor commands.

To keep costs low and control both effortless and precise, RUKA leverages data-driven methods. We autonomously collect data using a MANUS glove attached to RUKA and learn a controller that maps fingertip positions to motor commands.
— Irmak Guzey (@irmakkguzey)
4:58 PM • Apr 18, 2025

RUKA operates at 40 Hz for control and 25 Hz for teleoperation using devices like motion-capture gloves or VR headsets. A calibration script ensures consistency across builds, and the project provides detailed assembly instructions and support for community engagement. Read more.

New AI-powered tools from around the web

Gödel's Therapy Room - Eval Harness & Leaderboard

Gödel's Therapy Room - LLM Ethical Reasoning Evaluation

gtr.dev

RightNow AI

Automatically profile, detect bottlenecks, and optimize your CUDA kernels for peak performance.

www.rightnowai.co

Steve | AI-Powered Competitive Intelligence for Businesses

Steve is an AI-driven competitive intelligence platform that tracks competitors' websites, analyzes market trends, and delivers real-time insights. Automate research, monitor industry shifts, and make data-driven decisions faster than ever.

hiresteve.ai