Google To Launch Vision Model This Month

Good morning. It’s Wednesday, March 5th.

On this day in tech history: In 1924, The Computing-Tabulating-Recording Corporation (CTR) officially rebranded as International Business Machines Corporation (IBM).

In today’s email:

  • Google’s Gemini w/ Vision Coming Soon

  • The Most Realistic AI Voice Yet

  • Altman Hints At Image Gen Upgrade

  • 6 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories

Google to launch Gemini with Vision in March for AI-powered live video analysis

Google is rolling out live video analysis and screen-sharing features for Gemini as part of the Google One AI Premium Plan. The update allows users to stream video from their smartphone cameras or share their screens for real-time AI-powered insights. Initially exclusive to Android devices, the features support multiple languages and enhance Gemini’s ability to interpret visual content.

This expansion aligns with Google’s broader vision for multimodal AI, leading up to "Project Astra," an assistant designed to process text, video, and audio in real-time. While Astra’s full rollout remains uncertain, these incremental updates suggest Google is steadily embedding multimodal AI into everyday interactions.

Google has also added lockscreen widgets to its Gemini AI assistant on iOS and iPadOS, enabling instant access to key features like text prompts, live conversations, voice commands, and image analysis. With a full Siri overhaul still years away, Google is capitalizing on the gap. Read more.

Sesame’s AI Voice Demo Stuns With Realism

Sesame AI’s Conversational Speech Model (CSM) delivers strikingly human-like voices, mimicking breath sounds, chuckles, and self-corrections. Built on Meta’s Llama architecture, it processes text and audio in a single-stage transformer model, enhancing realism beyond traditional text-to-speech.

The demo, featuring voices “Miles” and “Maya,” has impressed users while raising concerns over emotional attachment and deepfake risks. Blind tests show CSM’s speech rivals human recordings, though real voices still hold an edge in context. Its ability to roleplay dynamic personalities, including aggressive tones, sets it apart from competitors.

Sesame plans to expand language support, scale its models, and open-source key components. Read more.

Altman Hints at Major Image Generation Upgrade

OpenAI CEO Sam Altman announced that GPT-4.5 will roll out gradually to Plus-tier users over several days. He stated that an immediate full release would have required stricter rate limits, and the team expects high usage.

In a separate response, Altman hinted at significant improvements to ChatGPT’s image generation. When a user complained about declining quality, he replied that they would soon be "wild with joy," suggesting an upcoming upgrade. OpenAI has not provided a timeline for these enhancements. Read more.

6 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!