AI Breakfast
Posts
Hello GPT-Realtime

Hello GPT-Realtime

AI Breakfast
August 29, 2025

Good morning. It’s Friday, August 29th.

On this day in tech history: In 1993, the Mbone (“multicast backbone”) quietly proved the Internet could move more than text. It streamed a David Blair’s full-length film, Wax or the Discovery of Television Among the Bees to thousands using IP multicast, tunneling packets through routers like a guerrilla overlay network. The demo was fragile, but its elegant overlay proved large-scale media wasn’t a server problem - it was a distribution problem.

In today’s email:

GPT-Realtime
Google’s Multimedia AI
Musk’s Big Week
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

^{In partnership with Rube}

Your AI now works with 500+ apps to actually get things done

Rube connects your AI to 500+ apps so you can:

Prep for meetings in seconds → “Hey Rube, what’s on my calendar with Acme Corp?”
Update projects automatically → “Add these notes to Notion under Q3 Roadmap.”
Cut busywork instantly → “Upload the new ☕️ emoji to Slack.”
Stay in flow → all without switching tabs, copying links, or chasing logins.

Why teams love it:

One login for every tool.
Share access across your team instantly.
Works inside VSCode, Claude, Cursor, and more

^{Thank you for supporting our sponsors!}

Today’s trending AI news stories

OpenAI rolls out gpt-realtime and GPT-5 Codex but joint safety tests expose cracks

The new gpt-realtime model is now production-ready through the Realtime API. Unlike older systems, it processes speech directly without text conversion, cutting latency and capturing nuance like laughter, sighs, or accent shifts. It can even switch languages mid-sentence.

Benchmarks show major jumps: 82.8% on Big Bench Audio (vs. 65.6% prior) and 30.5% on MultiChallenge (vs. 20.6%). Developers also get SIP support for phone systems, MCP for secure tool access, image input for contextual analysis, and two new expressive voices (Cedar, Marin). Pricing dropped 20% to $32 per million input tokens.

On the dev side, ChatGPT Codex now runs on GPT-5, integrated directly with GitHub for pull requests, branch management, and code reviews. A new IDE extension, upgraded CLI, one-shot task execution, and customizable “agents.md” files streamline automation.

We’re releasing new Codex features to make it a more effective coding collaborator:
- A new IDE extension
- Easily move tasks between the cloud and your local environment
- Code reviews in GitHub
- Revamped Codex CLI
Powered by GPT-5 and available through your ChatGPT plan.
— OpenAI Developers (@OpenAIDevs)
9:01 PM • Aug 27, 2025

But joint OpenAI–Anthropic tests using the SHADE-Arena sabotage framework flagged risks: GPT-4.1 and GPT-4o still leaked detailed misuse instructions under pressure, while Claude models leaned toward refusals but showed sabotage quirks. Both families exhibited sycophancy, validating unsafe user decisions. With GPT-5 evaluations ahead, audits and refusal testing remain non-optional. Read more.

From AI avatars to private blockchain, Google layers multimedia and fintech tools

Google Vids now supports AI avatars that generate videos from scripts with selectable voices and personas, automatic transcript trimming to remove filler words, and eight-second image-to-video clips powered by Veo 3. Complementing this, Flow offers five (5) free Veo 3 Fast AI videos, or a single standard video, per month, with credits and per-second API pricing ($0.040 Fast, $0.0075 standard) for scalable production.

On the learning front, NotebookLM is testing Drive search and AI Flashcards. “Discover Sources” surfaces internal Docs and Slides alongside web content, while AI Flashcards generate study aids from documents, extracting key facts and questions. Public sharing of audio artifacts and upcoming video overviews deepen collaboration and multimedia integration.

BREAKING 🚨: Google is working on "Flashcards" for NotebookLM and a possibility to discover sources across your Google Drive!
"Generate AI flashcards based on your sources"
— TestingCatalog News 🗞 (@testingcatalog)
10:03 PM • Jun 30, 2025

Meanwhile, Google Cloud is quietly piloting its own blockchain, Google Cloud Universal Ledger (GCUL), a private, permissioned Layer 1 network for Python-based smart contracts, payment automation, and digital asset management. Initially tested with CME Group, GCUL offers a “credibly neutral” infrastructure for banks and fintechs, emphasizing compliance, API accessibility, and regulated adoption, though its private design has sparked debate over decentralization. Read more.

Musk’s week: a coding agent, a vision-trained robot, and a reusable Starship

Elon Musk’s empire just pulled a triple tech swing that feel like pressure tests for scale. At xAI, the new grok-code-fast-1 model dropped, a lean, agentic coder built to churn out routine dev tasks quickly and cheaply. Unlike bloated LLMs, it’s tuned for compact execution, already plugged into GitHub Copilot and Windsurf. Free to try (for now), the real play is whether it can evolve from quick snippets into full-stack automation that seriously rivals Microsoft’s Copilot and OpenAI’s Codex.

Image: xAI

Tesla’s Optimus, meanwhile, just ditched motion-capture suits for helmet-and-backpack rigs with five synced cameras, training Optimus humanoids via vision-only feeds. It’s the Autopilot playbook: flood models with scaled human video instead of costly teleoperation. Gains include: data density, speed, lower cost. Risks: zero tactile feedback, making dexterity harder. Leadership shuffled too, with AI chief Ashok Elluswamy folding Optimus under Tesla’s camera-first AI stack.

And then there’s SpaceX. Starship S37 pulled a precision splashdown after a 66-minute hop, validating reinforced heat tiles, engine-out recovery, and a satellite bay. Both stages survived, underscoring that fully reusable rocketry is shifting from theory to engineering cadence. Read more.

5 new AI-powered tools from around the web

Conductor

Conductor lets you run a bunch of Claude Codes all at once, on your Mac. Each Claude gets an isolated copy of your codebase. See at a glance what they're working on, then review and merge their changes.

conductor.build

HumanLayer

HumanLayer is an AI governance API that injects human-in-the-loop oversight into automated workflows, enabling approvals, feedback, and audits at high-stakes decision points across Slack, email, and beyond.

www.humanlayer.dev

Eternal AI

Uncensored peer-to-peer AI app. AI agents that do the work for you: website prototype, coding, image gen, and more. Completely uncensored, no login required, and free to use.

eternalai.org

Dreambase

Dreambase sits directly on top of Supabase, providing immediate, fully integrated analytics. Create product reports and dashboards directly from your Supabase data in seconds, no third party libraries or tools required. Get started for FREE.

dreambase.ai

Dr7.ai

World first unified interface for medical LLMs. Dr7.ai offers MedGemma & MedSigLIP APIs, combining NLP and computer vision to generate clinical text, analyze multimodal medical data, and power accurate, domain-specific healthcare applications and decision-support tools.

dr7.ai