- AI Breakfast
- Posts
- OpenAI's New Voice Mode
OpenAI's New Voice Mode
Good morning. It’s Friday, March 21st.
Today in tech history: On this day in 2006, Jack Dorsey sent the first tweet, marking the launch of Twitter.
In today’s email:
OpenAI’s New Voice Mode
Oracle’s No-Code Agent Tool
Pika Labs’ AI Video Editing
4 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think by replying to this email.
Unlock the full potential of your workday with cutting-edge AI strategies and actionable insights, empowering you to achieve unparalleled excellence in the future of work. Download the free guide today!

Today’s trending AI news stories
OpenAI’s New AI Voice Model Turns Any Text App into a Voice-Powered AI in Seconds
OpenAI has launched three advanced voice AI models—gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts—engineered for high-fidelity transcription and customizable speech synthesis. Integrated into OpenAI’s API and accessible via OpenAI.fm, these models refine real-time transcription with a 2.46% English word error rate, enhanced noise cancellation, and semantic voice activity detection. Unlike Whisper, they do not support speaker diarization but offer superior accuracy across 100+ languages.
Three new state-of-the-art audio models in the API:
🗣️ Two speech-to-text models—outperforming Whisper
💬 A new TTS model—you can instruct it *how* to speak🤖 And the Agents SDK now supports audio, making it easy to build voice agents.
Try TTS now at .
— OpenAI Developers (@OpenAIDevs)
5:25 PM • Mar 20, 2025
With pricing set at $6 per million audio input tokens, OpenAI enters a competitive landscape dominated by ElevenLabs’ Scribe and Hume AI’s Octave TTS. Developers can embed voice functionality with minimal code via OpenAI’s Agents SDK.
Some critics argue OpenAI is deprioritizing real-time conversational AI, while some say this trajectory suggests a bigger play—one that extends beyond transcription into full-spectrum multimodal intelligence. Read more.
Oracle Lets Customers Build AI Agents with No-Code Studio
Oracle just put enterprise AI on autopilot with AI Agent Studio, a no-cost tool that lets users craft and refine AI agents inside its Fusion Cloud Application Suite. Featuring drag-and-drop customization, API-level access, and a library of prebuilt templates, the platform keeps automation tight with existing business logic. Users can tweak over 50 preconfigured agents, wire in third-party APIs, and swap between Llama, Cohere, OpenAI’s GPT, or other LLMs—all without starting from scratch.
The platform supports multi-agent orchestration, allowing agents to collaborate on tasks with checkpoints and approvals. While Fusion security policies extend to new agents, connecting to third-party APIs may require additional coding. Oracle leverages REST APIs for external integration, enhancing automation without disrupting existing systems. Read more.
Pika previews precision video editing—move objects without disrupting scenes
Pika has released a behind-the-scenes preview of its latest AI-powered video editing tool, allowing users to manipulate characters and objects within a scene while keeping the rest of the footage untouched. This precision-editing capability opens new creative possibilities, offering greater control without the usual artifacts or distortions.
Behind the scenes sneak peek 👀
Manipulate any character or object in your video, while keeping the rest perfectly intact.Become a Pika Creative Partner to get exclusive early access.
— Pika (@pika_labs)
4:02 PM • Mar 20, 2025
The feature is currently available to Pika Creative Partners through exclusive early access, hinting at broader rollout plans. Read more.

Nvidia’s Cosmos-Transfer1 pushes robot training AI simulations past the uncanny valley
Nvidia and xAI lock arms with Microsoft and BlackRock to bankroll AI’s infrastructure
Hugging Face submits open-source blueprint, challenging Big Tech in White House AI policy fight
Google and UC Berkeley hype inference-time search—skeptics push back
Perplexity Expands AI Ambitions with DeepResearch Overhaul and $1B Funding Talks
Apple reshuffles AI leadership, moving Siri under Vision Pro architect
Open-Sora 2.0 pushes open-source video AI to near-commercial precision
Speed is King: How Google’s $32B Wiz play rewrites DevOps security rules
A New Metric to Quantify Capabilities of AI systems in Terms of Human Capabilities
Topaz Labs Launches Gigapixel 8.3 with Fastest High-Res Image Recovery Model
Hugging Face's new iOS app taps AI to describe what you're looking at

4 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.
📄 $ϕ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation


Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!