What is Nano-banana?

In partnership with

Good morning. It’s Friday, August 22nd.

On this day in tech history: In 2019, a Russian humanoid robot named FEDOR (Final Experimental Demonstration Object Research) was launched to the International Space Station to autonomously operate and test remote manipulator tasks in orbit. A quirky yet meaningful precursor to today’s discussions around robotics, autonomy, and microgravity operations.

In today’s email:

  • Nano-banana

  • Elevenlabs v3

  • China’s Z.ai agents

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Turn AI Into Your Income Stream

The AI economy is booming, and smart entrepreneurs are already profiting. Subscribe to Mindstream and get instant access to 200+ proven strategies to monetize AI tools like ChatGPT, Midjourney, and more. From content creation to automation services, discover actionable ways to build your AI-powered income. No coding required, just practical strategies that work.

Today’s trending AI news stories

Google turns AI into infrastructure: Nano-banana, Search, and Pixel get system-level upgrades

Google is moving fast to make AI less of a bolt-on feature and more of the operating system for its ecosystem with three major product pushes this week: Flow, Search, and Gemini hardware.

In Flow, the nano-banana model takes text-to-image beyond novelty filters. With reference control, creators can upload an image and generate stylistically consistent variations ready for direct use in video timelines. That eliminates the need to bounce through third-party tools. Google is pairing this with vertical aspect ratio support for TikTok and Shorts, prompt preambles that auto-extend simple queries into cinematic or vlogging styles, and social hooks like QR sharing. Flow starts looking like a full-stack studio with built-in distribution.

In Search, AI Mode is now rolling out to 180+ countries, and trades static “AI Overviews” for agentic behaviors that act on intent. For now, these task-execution features remain gated behind the Google AI Ultra subscription in the US.

The Pixel 10 launch shows Google wants Gemini embedded directly into devices. The Tensor G5 chip runs Gemini Nano entirely on-device, eliminating latency and cloud dependency. That means instant context-sensitive features like Magic Cue, which pulls from Gmail and Calendar before a query is even typed, or live 11-language voice translation without an internet connection. Pixel’s camera stack now includes “Camera Coach” for real-time framing guidance and 100× Super Res zoom stitched by AI inference.

Even Google Photos is now conversational: users can simply ask for edits, from cleanup to creative background swaps, while C2PA Content Credentials, IPTC data, and SynthID tags signal when AI is involved.

Health is also a core theme. Fitbit has been rebuilt as a Gemini-powered coach that adapts to your recovery, sleep, or travel schedule. It doesn’t just spit out metrics - it explains, adjusts, and plans. Gemini Live expands conversational AI with visual annotations, while Gemini for Home replaces Google Assistant on Nest, for a more serious living room play.

Google is collapsing workflows across media, search, and devices. Flow creates, Search executes, Pixel embeds - and Gemini ties it all together. Read more.

Elevenlabs releases its v3 model with new expression controls and support for unlimited speakers

ElevenLabs has rolled out Eleven v3 (alpha), a major upgrade to its text-to-speech API that makes AI voices more expressive and versatile. The model now supports unlimited speakers in dialog mode, letting developers build multi-character conversations without workarounds. It also adds fine-grained audio controls with emotion, pitch, and style so voices can laugh, whisper, or convey subtle nuance naturally.

Eleven v3 covers over 70 languages, expanding global reach, and is accessible via a free API account, with some advanced features behind a paywall. Documentation provides full examples for integrating expressive speech into apps, media, or virtual assistants. Read more.

Chinese unicorn Z.ai unifies mobile and desktop automation with next-gen AI agents
 

Z.ai is taking agentic AI to the next level, merging mobile and desktop automation into an ecosystem. Its smartphone agent, powered by GLM-Z1-Air and GLM-4-Air-0414, can autonomously handle complex multi-step tasks without forcing users to switch apps. The cloud-backed AutoGLM engine interprets natural-language instructions in real time, supporting “life assistant” and “office assistant” modes and giving users the ability to hand over devices to the agent for full workflow execution.

On the desktop, Z.ai’s ComputerRL framework bridges the gap between programmatic reasoning and human-centric interfaces. By combining API calls with direct GUI interactions, AutoGLM-OS-9B, built on GLM-4-9B-0414, achieves SoTA accuracy on the OSWorld benchmark, excelling in multi-step reasoning, tool use, and general-purpose automation. Z.ai is positioning itself as one of the top global competitors in human-aligned AI with $1.4 billion in funding and an IPO targeted for 2026. Read more.

5 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!