AI Breakfast
Posts
Inside OpenAI's 'Deep Research' Model

Inside OpenAI's 'Deep Research' Model

AI Breakfast
February 03, 2025

Good morning. It’s Monday, February 3rd.

Did you know: On this day in 1986, the term “vaporware” was first used by Philip Elmer-DeWitt in a TIME magazine article? The term is now commonly used to describe software that has been long announced but hasn’t actually been released.

In today’s email:

OpenAI’s Deep Research
Reasoning Models Suffer From “Underthinking”
Stories You May Have Missed
3 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

^{In partnership with NEX}

^{Create Stunning Product Media with AI From NEX}

Transform your marketing effortlessly with NEX's Marko.

Create on-point campaigns in minutes using AI-powered tools for content, strategy, and design—all in one platform.

Loved by 20k+ pros, it’s your shortcut to brand consistency and faster results.

Try it free today!

Today’s trending AI news stories

OpenAI Launches New ChatGPT agent for 'deep research' targeting professional analysts

OpenAI has introduced "deep research," a new ChatGPT agent designed to tackle complex research tasks across fields like finance, science, policy, and engineering, as well as for consumers making high-stakes decisions. Unlike basic AI queries, this agent pulls from multiple sources, synthesising information to deliver more detailed and reliable insights.

Today we are launching our next agent capable of doing work for you independently—deep research.
Give it a prompt and ChatGPT will find, analyze & synthesize hundreds of online sources to create a comprehensive report in tens of minutes vs what would take a human many hours.
— OpenAI (@OpenAI)
1:04 AM • Feb 3, 2025

Available now to ChatGPT Pro users with a 100-query limit per month, deep research will soon extend to Plus and Team users. Currently text-based, future updates will bring images, visualisations, and deeper analytic features. The tool is powered by OpenAI’s o3 "reasoning" model, optimised for web browsing and data analysis. While impressive, the model isn’t flawless—errors and misinterpretations still occur. To mitigate misinformation, all outputs come with full citations. Read more.

Reasoning models like Deepseek-R1 and OpenAI o1 suffer from 'underthinking', study finds

The number of tokens generated and the number of "thoughts" (solution approaches) for different models. On average, o1-like LLMs use 225 percent more tokens for incorrect answers than for correct ones, which is due to 418 percent more frequent thought changes. | Image: Wang et al.

A recent study by Tencent AI Lab, Soochow University, and Shanghai Jiao Tong University reveals that reasoning models such as Deepseek-R1 and OpenAI’s o1 fall victim to "underthinking" — prematurely discarding viable solutions, which leads to inefficiencies in resource usage and suboptimal accuracy. These models frequently alter their problem-solving approaches, especially in more complex tasks, resulting in a 225% increase in computational tokens and 418% more strategy shifts when delivering incorrect answers.

Astonishingly, 70% of these errors involved untapped lines of reasoning. To mitigate this, the researchers introduced a "thought switching penalty" (TIP) mechanism, which discourages premature shifts, enhancing accuracy and consistency in math and science challenges without requiring significant modifications to the models. Read more.

3 new AI-powered tools from around the web

API Testing, API Documentation, Simplified with AI | Qodex

No coding required! Qodex simplies API testing and API Documentation. Get automated testing at a fraction of your testing budget, saving money and time with AI-powered solutions.

qodex.ai

Pixno - Images to text notes with AI

Turn your photos, screenshots and images to text notes with AI. AI note taking plugin to all notes app, including notion, google docs, obsidian, etc.

photes.io

JoggAI Community - Amplify Creativity with AI Avatar Generator

Get inspired in JoggAI's community and create your unique AI Avatar. Let your imagination run wild and share your creations with others!

www.jogg.ai/community

arXiv is a free online library where researchers share pre-publication papers.

📄 Large Language Models Think Too Fast To Explore Effectively

📄 s1: Simple test-time scaling

📄 Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

📄 Reward-Guided Speculative Decoding for Efficient LLM Reasoning

📄 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!