Hello GPT-Realtime

Good morning. It’s Friday, August 29th.

On this day in tech history: In 1993, the Mbone (“multicast backbone”) quietly proved the Internet could move more than text. It streamed a David Blair’s full-length film, Wax or the Discovery of Television Among the Bees to thousands using IP multicast, tunneling packets through routers like a guerrilla overlay network. The demo was fragile, but its elegant overlay proved large-scale media wasn’t a server problem - it was a distribution problem.

In today’s email:

  • GPT-Realtime

  • Google’s Multimedia AI

  • Musk’s Big Week

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

In partnership with Rube

Your AI now works with 500+ apps to actually get things done

Rube connects your AI to 500+ apps so you can:

  • Prep for meetings in seconds“Hey Rube, what’s on my calendar with Acme Corp?”

  • Update projects automatically“Add these notes to Notion under Q3 Roadmap.”

  • Cut busywork instantly“Upload the new ☕️ emoji to Slack.”

  • Stay in flow → all without switching tabs, copying links, or chasing logins.

Why teams love it:

  • One login for every tool.

  • Share access across your team instantly.

  • Works inside VSCode, Claude, Cursor, and more

Thank you for supporting our sponsors!

Today’s trending AI news stories

OpenAI rolls out gpt-realtime and GPT-5 Codex but joint safety tests expose cracks

The new gpt-realtime model is now production-ready through the Realtime API. Unlike older systems, it processes speech directly without text conversion, cutting latency and capturing nuance like laughter, sighs, or accent shifts. It can even switch languages mid-sentence.

Benchmarks show major jumps: 82.8% on Big Bench Audio (vs. 65.6% prior) and 30.5% on MultiChallenge (vs. 20.6%). Developers also get SIP support for phone systems, MCP for secure tool access, image input for contextual analysis, and two new expressive voices (Cedar, Marin). Pricing dropped 20% to $32 per million input tokens.

On the dev side, ChatGPT Codex now runs on GPT-5, integrated directly with GitHub for pull requests, branch management, and code reviews. A new IDE extension, upgraded CLI, one-shot task execution, and customizable “agents.md” files streamline automation.

But joint OpenAI–Anthropic tests using the SHADE-Arena sabotage framework flagged risks: GPT-4.1 and GPT-4o still leaked detailed misuse instructions under pressure, while Claude models leaned toward refusals but showed sabotage quirks. Both families exhibited sycophancy, validating unsafe user decisions. With GPT-5 evaluations ahead, audits and refusal testing remain non-optional. Read more.

From AI avatars to private blockchain, Google layers multimedia and fintech tools

Google Vids now supports AI avatars that generate videos from scripts with selectable voices and personas, automatic transcript trimming to remove filler words, and eight-second image-to-video clips powered by Veo 3. Complementing this, Flow offers five (5) free Veo 3 Fast AI videos, or a single standard video, per month, with credits and per-second API pricing ($0.040 Fast, $0.0075 standard) for scalable production.

On the learning front, NotebookLM is testing Drive search and AI Flashcards. “Discover Sources” surfaces internal Docs and Slides alongside web content, while AI Flashcards generate study aids from documents, extracting key facts and questions. Public sharing of audio artifacts and upcoming video overviews deepen collaboration and multimedia integration.

Meanwhile, Google Cloud is quietly piloting its own blockchain, Google Cloud Universal Ledger (GCUL), a private, permissioned Layer 1 network for Python-based smart contracts, payment automation, and digital asset management. Initially tested with CME Group, GCUL offers a “credibly neutral” infrastructure for banks and fintechs, emphasizing compliance, API accessibility, and regulated adoption, though its private design has sparked debate over decentralization. Read more.

Musk’s week: a coding agent, a vision-trained robot, and a reusable Starship

Elon Musk’s empire just pulled a triple tech swing that feel like pressure tests for scale. At xAI, the new grok-code-fast-1 model dropped, a lean, agentic coder built to churn out routine dev tasks quickly and cheaply. Unlike bloated LLMs, it’s tuned for compact execution, already plugged into GitHub Copilot and Windsurf. Free to try (for now), the real play is whether it can evolve from quick snippets into full-stack automation that seriously rivals Microsoft’s Copilot and OpenAI’s Codex.

Image: xAI

Tesla’s Optimus, meanwhile, just ditched motion-capture suits for helmet-and-backpack rigs with five synced cameras, training Optimus humanoids via vision-only feeds. It’s the Autopilot playbook: flood models with scaled human video instead of costly teleoperation. Gains include: data density, speed, lower cost. Risks: zero tactile feedback, making dexterity harder. Leadership shuffled too, with AI chief Ashok Elluswamy folding Optimus under Tesla’s camera-first AI stack.

And then there’s SpaceX. Starship S37 pulled a precision splashdown after a 66-minute hop, validating reinforced heat tiles, engine-out recovery, and a satellite bay. Both stages survived, underscoring that fully reusable rocketry is shifting from theory to engineering cadence. Read more.

5 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!