AI Breakfast
Posts
OpenAI's New Voice Mode

OpenAI's New Voice Mode

AI Breakfast
March 21, 2025

Good morning. It’s Friday, March 21st.

Today in tech history: On this day in 2006, Jack Dorsey sent the first tweet, marking the launch of Twitter.

In today’s email:

OpenAI’s New Voice Mode
Oracle’s No-Code Agent Tool
Pika Labs’ AI Video Editing
4 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Discover 100 Game-Changing Side Hustles for 2025

In today's economy, relying on a single income stream isn't enough. Our expertly curated database gives you everything you need to launch your perfect side hustle.

Explore vetted opportunities requiring minimal startup costs
Get detailed breakdowns of required skills and time investment
Compare potential earnings across different industries
Access step-by-step launch guides for each opportunity
Find side hustles that match your current skills

Ready to transform your income?

Download Now

Today’s trending AI news stories

OpenAI’s New AI Voice Model Turns Any Text App into a Voice-Powered AI in Seconds

OpenAI has launched three advanced voice AI models—gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts—engineered for high-fidelity transcription and customizable speech synthesis. Integrated into OpenAI’s API and accessible via OpenAI.fm, these models refine real-time transcription with a 2.46% English word error rate, enhanced noise cancellation, and semantic voice activity detection. Unlike Whisper, they do not support speaker diarization but offer superior accuracy across 100+ languages.

Three new state-of-the-art audio models in the API:
🗣️ Two speech-to-text models—outperforming Whisper
💬 A new TTS model—you can instruct it *how* to speak
🤖 And the Agents SDK now supports audio, making it easy to build voice agents.
Try TTS now at .
— OpenAI Developers (@OpenAIDevs)
5:25 PM • Mar 20, 2025

With pricing set at $6 per million audio input tokens, OpenAI enters a competitive landscape dominated by ElevenLabs’ Scribe and Hume AI’s Octave TTS. Developers can embed voice functionality with minimal code via OpenAI’s Agents SDK.

Some critics argue OpenAI is deprioritizing real-time conversational AI, while some say this trajectory suggests a bigger play—one that extends beyond transcription into full-spectrum multimodal intelligence. Read more.

Oracle Lets Customers Build AI Agents with No-Code Studio

Oracle just put enterprise AI on autopilot with AI Agent Studio, a no-cost tool that lets users craft and refine AI agents inside its Fusion Cloud Application Suite. Featuring drag-and-drop customization, API-level access, and a library of prebuilt templates, the platform keeps automation tight with existing business logic. Users can tweak over 50 preconfigured agents, wire in third-party APIs, and swap between Llama, Cohere, OpenAI’s GPT, or other LLMs—all without starting from scratch.

The platform supports multi-agent orchestration, allowing agents to collaborate on tasks with checkpoints and approvals. While Fusion security policies extend to new agents, connecting to third-party APIs may require additional coding. Oracle leverages REST APIs for external integration, enhancing automation without disrupting existing systems. Read more.

Pika previews precision video editing—move objects without disrupting scenes

Pika has released a behind-the-scenes preview of its latest AI-powered video editing tool, allowing users to manipulate characters and objects within a scene while keeping the rest of the footage untouched. This precision-editing capability opens new creative possibilities, offering greater control without the usual artifacts or distortions.

Behind the scenes sneak peek 👀
Manipulate any character or object in your video, while keeping the rest perfectly intact.
Become a Pika Creative Partner to get exclusive early access.
— Pika (@pika_labs)
4:02 PM • Mar 20, 2025

The feature is currently available to Pika Creative Partners through exclusive early access, hinting at broader rollout plans. Read more.

4 new AI-powered tools from around the web

Epiphany | Voice-to-Action

The fastest, most frictionless way to capture your ideas with voice and create actions with them in tools like Notion, Asana, Todoist, Clickup, Todoist, Obsidian, and more.

epiphanyvoice.io

Credal | The Secure AI Agent Platform

The secure AI agent platform for enterprises. Credal powers multi-agent workflows and enterprise AI search across your data, tools, and expertise.

www.credal.ai

Free AI Background Generator ｜Pacdora

Upload your product image, and Pacdora’s free AI background generator will create realistic backgrounds. Enhance your product image in just a few minutes!

www.pacdora.com/tools/ai-background-generator