- AI Breakfast
- Posts
- Google To Launch Vision Model This Month
Google To Launch Vision Model This Month
Good morning. It’s Wednesday, March 5th.
On this day in tech history: In 1924, The Computing-Tabulating-Recording Corporation (CTR) officially rebranded as International Business Machines Corporation (IBM).
In today’s email:
Google’s Gemini w/ Vision Coming Soon
The Most Realistic AI Voice Yet
Altman Hints At Image Gen Upgrade
6 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories
Google to launch Gemini with Vision in March for AI-powered live video analysis
Google is rolling out live video analysis and screen-sharing features for Gemini as part of the Google One AI Premium Plan. The update allows users to stream video from their smartphone cameras or share their screens for real-time AI-powered insights. Initially exclusive to Android devices, the features support multiple languages and enhance Gemini’s ability to interpret visual content.
This expansion aligns with Google’s broader vision for multimodal AI, leading up to "Project Astra," an assistant designed to process text, video, and audio in real-time. While Astra’s full rollout remains uncertain, these incremental updates suggest Google is steadily embedding multimodal AI into everyday interactions.
Google has also added lockscreen widgets to its Gemini AI assistant on iOS and iPadOS, enabling instant access to key features like text prompts, live conversations, voice commands, and image analysis. With a full Siri overhaul still years away, Google is capitalizing on the gap. Read more.
Related Story: Google upgrades Colab with an AI agent tool
Sesame’s AI Voice Demo Stuns With Realism
Sesame AI’s Conversational Speech Model (CSM) delivers strikingly human-like voices, mimicking breath sounds, chuckles, and self-corrections. Built on Meta’s Llama architecture, it processes text and audio in a single-stage transformer model, enhancing realism beyond traditional text-to-speech.
The new AI voice from Sesame really is a powerful illustration of where AI is going.
This is all real-time, from my browser. Excellent use of disfluencies, pauses, even intakes of breathe really make this seem like a human, though bits of uncanniness remain, at least for now.
— Ethan Mollick (@emollick)
2:59 AM • Mar 4, 2025
The demo, featuring voices “Miles” and “Maya,” has impressed users while raising concerns over emotional attachment and deepfake risks. Blind tests show CSM’s speech rivals human recordings, though real voices still hold an edge in context. Its ability to roleplay dynamic personalities, including aggressive tones, sets it apart from competitors.
Sesame plans to expand language support, scale its models, and open-source key components. Read more.
Altman Hints at Major Image Generation Upgrade
OpenAI CEO Sam Altman announced that GPT-4.5 will roll out gradually to Plus-tier users over several days. He stated that an immediate full release would have required stricter rate limits, and the team expects high usage.
you will relatively soon be wild with joy
— Sam Altman (@sama)
11:25 PM • Mar 4, 2025
In a separate response, Altman hinted at significant improvements to ChatGPT’s image generation. When a user complained about declining quality, he replied that they would soon be "wild with joy," suggesting an upcoming upgrade. OpenAI has not provided a timeline for these enhancements. Read more.

Stability AI and Arm bring offline AI audio generation to smartphones
Amazon expands AI across its ecosystem, strengthens AGI and compute strategy
OpenAI’s NextGenAI funds AI research at Harvard, MIT, and Oxford
Cortical Labs unveils CL1, the first biological-silicon hybrid computer
Claude 3.7 Sonnet joins ElevenLabs AI for advanced reasoning and voice AI
Microsoft's Dragon Copilot AI listens in on doctor-patient talks to automate medical paperwork
Salesforce launches an AI agent skills marketplace with 200+ initial partners
T-Mobile’s parent company and Perplexity team up for AI-first smartphone under $1K
Nvidia CEO Jensen Huang says its US AI chips are around "60 times" faster than Chinese counterparts
ASLP Labs debuted DiffRhythm, an open-weights model capable of generating 4-minute songs with vocals
Nvidia-backed cloud firm CoreWeave to acquire AI developer platform Weights & Biases
Anthropic raises $3.5 billion in new funding, valuing the AI company at over $60 billion
LlamaIndex launches a cloud service for building unstructured data agents
Meta researchers generate photorealistic avatars from just four selfies
US' light-powered AI camera processes images 99.4% faster without electricity
China: Interactive humanoid robots take over police patrol duties in streets

6 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.


Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!