- AI Breakfast
- Posts
- What is Nano-banana?
What is Nano-banana?
Good morning. It’s Friday, August 22nd.
On this day in tech history: In 2019, a Russian humanoid robot named FEDOR (Final Experimental Demonstration Object Research) was launched to the International Space Station to autonomously operate and test remote manipulator tasks in orbit. A quirky yet meaningful precursor to today’s discussions around robotics, autonomy, and microgravity operations.
In today’s email:
Nano-banana
Elevenlabs v3
China’s Z.ai agents
5 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think by replying to this email.
Turn AI Into Your Income Stream
The AI economy is booming, and smart entrepreneurs are already profiting. Subscribe to Mindstream and get instant access to 200+ proven strategies to monetize AI tools like ChatGPT, Midjourney, and more. From content creation to automation services, discover actionable ways to build your AI-powered income. No coding required, just practical strategies that work.

Today’s trending AI news stories
Google turns AI into infrastructure: Nano-banana, Search, and Pixel get system-level upgrades
Google is moving fast to make AI less of a bolt-on feature and more of the operating system for its ecosystem with three major product pushes this week: Flow, Search, and Gemini hardware.
In Flow, the nano-banana model takes text-to-image beyond novelty filters. With reference control, creators can upload an image and generate stylistically consistent variations ready for direct use in video timelines. That eliminates the need to bounce through third-party tools. Google is pairing this with vertical aspect ratio support for TikTok and Shorts, prompt preambles that auto-extend simple queries into cinematic or vlogging styles, and social hooks like QR sharing. Flow starts looking like a full-stack studio with built-in distribution.
In Search, AI Mode is now rolling out to 180+ countries, and trades static “AI Overviews” for agentic behaviors that act on intent. For now, these task-execution features remain gated behind the Google AI Ultra subscription in the US.
The Pixel 10 launch shows Google wants Gemini embedded directly into devices. The Tensor G5 chip runs Gemini Nano entirely on-device, eliminating latency and cloud dependency. That means instant context-sensitive features like Magic Cue, which pulls from Gmail and Calendar before a query is even typed, or live 11-language voice translation without an internet connection. Pixel’s camera stack now includes “Camera Coach” for real-time framing guidance and 100× Super Res zoom stitched by AI inference.
Even Google Photos is now conversational: users can simply ask for edits, from cleanup to creative background swaps, while C2PA Content Credentials, IPTC data, and SynthID tags signal when AI is involved.
Health is also a core theme. Fitbit has been rebuilt as a Gemini-powered coach that adapts to your recovery, sleep, or travel schedule. It doesn’t just spit out metrics - it explains, adjusts, and plans. Gemini Live expands conversational AI with visual annotations, while Gemini for Home replaces Google Assistant on Nest, for a more serious living room play.
Google is collapsing workflows across media, search, and devices. Flow creates, Search executes, Pixel embeds - and Gemini ties it all together. Read more.
Elevenlabs releases its v3 model with new expression controls and support for unlimited speakers
ElevenLabs has rolled out Eleven v3 (alpha), a major upgrade to its text-to-speech API that makes AI voices more expressive and versatile. The model now supports unlimited speakers in dialog mode, letting developers build multi-character conversations without workarounds. It also adds fine-grained audio controls with emotion, pitch, and style so voices can laugh, whisper, or convey subtle nuance naturally.
Eleven v3 covers over 70 languages, expanding global reach, and is accessible via a free API account, with some advanced features behind a paywall. Documentation provides full examples for integrating expressive speech into apps, media, or virtual assistants. Read more.
Chinese unicorn Z.ai unifies mobile and desktop automation with next-gen AI agents
Z.ai is taking agentic AI to the next level, merging mobile and desktop automation into an ecosystem. Its smartphone agent, powered by GLM-Z1-Air and GLM-4-Air-0414, can autonomously handle complex multi-step tasks without forcing users to switch apps. The cloud-backed AutoGLM engine interprets natural-language instructions in real time, supporting “life assistant” and “office assistant” modes and giving users the ability to hand over devices to the agent for full workflow execution.
Introducing ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully.
arxiv.org/abs/2508.14040ComputerRL features the API-GUI paradigm, which unifies programmatic API calls and direct GUI interaction to
— Z.ai (@Zai_org)
2:31 PM • Aug 20, 2025
On the desktop, Z.ai’s ComputerRL framework bridges the gap between programmatic reasoning and human-centric interfaces. By combining API calls with direct GUI interactions, AutoGLM-OS-9B, built on GLM-4-9B-0414, achieves SoTA accuracy on the OSWorld benchmark, excelling in multi-step reasoning, tool use, and general-purpose automation. Z.ai is positioning itself as one of the top global competitors in human-aligned AI with $1.4 billion in funding and an IPO targeted for 2026. Read more.

AI now matches prediction markets in forecasting real events, study finds
Demis Hassabis: Genie 3 shows how simulations could shape the future of understanding the universe
OpenAI researchers claims GPT-5 Pro can do original mathematics
CFOs, PE sponsors diverge on AI adoption approach: Accordion
Microsoft AI chief says it’s ‘dangerous’ to study AI consciousness
OpenAI lawyers question Meta's role in Elon Musk's $97B takeover bid
LTX Studio launches camera motion controls for LTXV and LTXV Turbo
China firm plans world's first pregnancy humanoid robot using artificial womb
MIT report misunderstood: Shadow AI economy booms while headlines cry failure
Chan Zuckerberg Initiative's rBio uses virtual cells to train AI, bypassing lab work
NASA’s new AI model can predict when a solar storm may strike
Perplexity rolls out $200/mo Max Assistant with advanced reasoning, teases SuperMemory launch
MindJourney enables AI to explore simulated 3D worlds to improve spatial interpretation
AI persuades best by overwhelming people with information instead of using psychological tricks
Richard Sutton says the AI industry has "lost its way" by ignoring core principles of intelligence
Watch: Unitree G1, the winner of solo dance at WHRG, wears an AGI t-shirt while performing
Anthropic rolls out Claude Code to Team and Enterprise with new admin controls
New from Liquid AI: LFM2-VL, super-fast open models for real-time vision and language
TikTok parent company ByteDance releases new open source Seed-OSS-36B model with 512K token context

5 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.


Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!