Inside OpenAI's 'Deep Research' Model

Good morning. It’s Monday, February 3rd.

Did you know: On this day in 1986, the term “vaporware” was first used by Philip Elmer-DeWitt in a TIME magazine article? The term is now commonly used to describe software that has been long announced but hasn’t actually been released.

In today’s email:

  • OpenAI’s Deep Research

  • Reasoning Models Suffer From “Underthinking”

  • Stories You May Have Missed

  • 3 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

In partnership with NEX

Transform your marketing effortlessly with NEX's Marko.

Create on-point campaigns in minutes using AI-powered tools for content, strategy, and design—all in one platform.

Loved by 20k+ pros, it’s your shortcut to brand consistency and faster results.

Today’s trending AI news stories

OpenAI Launches New ChatGPT agent for 'deep research' targeting professional analysts

OpenAI has introduced "deep research," a new ChatGPT agent designed to tackle complex research tasks across fields like finance, science, policy, and engineering, as well as for consumers making high-stakes decisions. Unlike basic AI queries, this agent pulls from multiple sources, synthesising information to deliver more detailed and reliable insights.

Available now to ChatGPT Pro users with a 100-query limit per month, deep research will soon extend to Plus and Team users. Currently text-based, future updates will bring images, visualisations, and deeper analytic features. The tool is powered by OpenAI’s o3 "reasoning" model, optimised for web browsing and data analysis. While impressive, the model isn’t flawless—errors and misinterpretations still occur. To mitigate misinformation, all outputs come with full citations. Read more.

Related Story: OpenAI Launches o3-mini Amid AI Cost Wars and Market Pressures

Reasoning models like Deepseek-R1 and OpenAI o1 suffer from 'underthinking', study finds

The number of tokens generated and the number of "thoughts" (solution approaches) for different models. On average, o1-like LLMs use 225 percent more tokens for incorrect answers than for correct ones, which is due to 418 percent more frequent thought changes. | Image: Wang et al.

A recent study by Tencent AI Lab, Soochow University, and Shanghai Jiao Tong University reveals that reasoning models such as Deepseek-R1 and OpenAI’s o1 fall victim to "underthinking" — prematurely discarding viable solutions, which leads to inefficiencies in resource usage and suboptimal accuracy. These models frequently alter their problem-solving approaches, especially in more complex tasks, resulting in a 225% increase in computational tokens and 418% more strategy shifts when delivering incorrect answers.

Astonishingly, 70% of these errors involved untapped lines of reasoning. To mitigate this, the researchers introduced a "thought switching penalty" (TIP) mechanism, which discourages premature shifts, enhancing accuracy and consistency in math and science challenges without requiring significant modifications to the models. Read more.

3 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!