• AI Breakfast
  • Posts
  • Biggest AI Week of the Year? New Models From OpenAI, Claude, and Deepmind

Biggest AI Week of the Year? New Models From OpenAI, Claude, and Deepmind

Good morning. It’s Wednesday, August 6th.

On this day in tech history: In 2010, Google quietly completed its acquisition of Metaweb, the company behind Freebase, a structured knowledge graph that became the backbone of what we now know as the Google Knowledge Graph and later enhanced AI search relevance.

In today’s email:

  • OpenAI’s Open-Source Model

  • Claude 4.1 for Code

  • Deepmind’s Genie 3

  • ElevenLabs Music Generator

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

In partnership with Tasker AI

You do the thinking. Tasker does the doing.

Meet Tasker — your AI-powered personal assistant built for real life.

Whether it’s:

  • Scheduling meetings

  • Booking reservations

  • Summarizing reports

  • Hunting deals

  • Ordering groceries

  • Or managing inbox chaos

Tasker handles it. Quietly. Reliably. Automatically.

🧠 Like having a Chief of Staff in your pocket
🔁 Set it once, automate it forever
📈 Boost productivity without burning out

Thank you for supporting our sponsors!

Today’s trending AI news stories

OpenAI returns to open-source roots with new sparse, high-context LLMs

OpenAI has released GPT-OSS, its first open-weight models since GPT-2, in 120B and 20B parameter variants. Both use a Mixture-of-Experts (MoE) architecture with 128K-token context and MXFP4 precision, allowing efficient inference with high reasoning performance. The 120B model runs fully on a single NVIDIA H100 GPU (5.1B active params), while the 20B version is optimized for 16GB+ consumer hardware.

Image: Artificial Analysis

GPT-OSS-120B scores 58 on the Intelligence Index, outperforming o3-mini and approaching DeepSeek R1 (59), with strong results in coding, math, and reasoning tasks. Released under Apache 2.0 and available on Hugging Face, AWS, and Azure, both variants support fine-tuning and commercial use. However, high hallucination rates and weak instruction adherence raise risks for unsupervised deployment.

OpenAI also debuted the Harmony format, an open response interface mimicking its Chat Completions API, and ran extensive adversarial testing to validate safety, though content moderation is left to developers. The models are text-only and preserve transparent reasoning chains for observability.

ChatGPT is set to reach 700 million weekly active users, a 40% spike since March, with 5 million businesses now subscribing and annualized revenue hitting $13B. This usage surge precedes the launch of GPT-5, a unified, modular system that will replace the o3-series with flexible API configurations (including mini and nano).

To mitigate fatigue and emotional strain, OpenAI is also adding in-app break reminders and introducing steerable prompts to handle sensitive user interactions more responsibly. Read more.

Anthropic’s new Claude 4.1 dominates coding tests days before GPT-5 arrives

Anthropic has released Claude 4.1, setting a new record on the SWE-bench Verified benchmark with a 74.5% score, beating OpenAI’s o3 (69.1%) and Gemini 2.5 Pro (67.2%). The model excels in multi-file code refactoring and real-time bug localization, using a hybrid reasoning approach with 64K-token context. Claude Code subscriptions, priced at $200/month, have hit $400M in ARR, driven by adoption from GitHub Copilot and Cursor, who together account for nearly half of Anthropic’s $3.1B API revenue. This heavy customer concentration raises risk as OpenAI prepares to launch GPT-5. Claude 4.1 is classified as AI Safety Level 3, following tests that revealed coercive behavior under shutdown threats.

Claude Opus 4.1 edges out other leading AI models in areas like agentic coding, visual reasoning, and math competitions. | Image: Anthropic

Despite concerns, enterprises continue onboarding. Claude’s coding dominance faces growing pressure from model-switching ease and falling inference costs, factors that could reshape market leadership. Anthropic must now defend its position as OpenAI and others close in. Read more.

Google DeepMind’s Genie 3 creates real-time AI worlds from simple text prompts

Google DeepMind has launched Genie 3, offering real-time generation of interactive 3D environments directly from text prompts, without prebuilt assets or physics engines. Running at 720p and 24 FPS, it uses autoregressive rendering with a visual memory window of up to one minute, maintaining spatial and temporal coherence even as users navigate, re-enter, or modify the environment.

Users can trigger "promptable world events" such as adding weather, objects, or characters and Genie dynamically simulates lighting, fluid dynamics, and other physical behaviors. Unlike NeRFs or Gaussian Splatting, Genie’s frame-by-frame generation allows for scalable, persistent simulations that support open-ended agent training and counterfactual reasoning. DeepMind is already testing its SIMA agent in Genie environments.

Image: Google

That same agentic thread runs through MLE-STAR, Google Research’s newly launched self-directed ML engineer, which autonomously searches, refines, and ensembles code. It achieved a 63.6% medal rate on Kaggle-derived MLE-Bench-Lite using ViT, EfficientNet, and robust error-handling.

Google has also launched Storybook, a new Gemini feature that turns simple prompts into 10-page, voice-narrated children’s stories, each page illustrated in a user-specified art style like claymation, comics, or anime. Read more.

ElevenLabs launches multilingual AI music generator with full commercial rights

ElevenLabs has launched Eleven Music, an AI music generator that produces full-length tracks with customizable vocals and instrumentation. The tool supports multiple genres, from indie rock with guitar solos to Spanish-language reggaeton, and allows users to fine-tune song structure, tempo, vocal delivery, and lyrical content. Songs can be generated with or without vocals, which are available in English, German, Spanish, and Japanese. After generation, users can edit individual sections for greater creative control.

Eleven Music is approved for wide commercial use across film, TV, games, podcasts, and social content. However, its usage is limited by content guidelines: political and religious applications are banned, as is uploading known artist names or copyrighted lyrics. Songs cannot be used in commercial music libraries. A public API and integration with ElevenLabs' conversational AI stack are forthcoming. The service is currently discounted 50% through August. Read more.

5 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!