AI Breakfast
Posts
What is Nano-banana?

What is Nano-banana?

AI Breakfast
August 22, 2025

In partnership with

Good morning. It’s Friday, August 22nd.

On this day in tech history: In 2019, a Russian humanoid robot named FEDOR (Final Experimental Demonstration Object Research) was launched to the International Space Station to autonomously operate and test remote manipulator tasks in orbit. A quirky yet meaningful precursor to today’s discussions around robotics, autonomy, and microgravity operations.

In today’s email:

Nano-banana
Elevenlabs v3
China’s Z.ai agents
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Turn AI Into Your Income Stream

The AI economy is booming, and smart entrepreneurs are already profiting. Subscribe to Mindstream and get instant access to 200+ proven strategies to monetize AI tools like ChatGPT, Midjourney, and more. From content creation to automation services, discover actionable ways to build your AI-powered income. No coding required, just practical strategies that work.

Subscribe to Get Your Free Guide

Today’s trending AI news stories

Google turns AI into infrastructure: Nano-banana, Search, and Pixel get system-level upgrades

Google is moving fast to make AI less of a bolt-on feature and more of the operating system for its ecosystem with three major product pushes this week: Flow, Search, and Gemini hardware.

In Flow, the nano-banana model takes text-to-image beyond novelty filters. With reference control, creators can upload an image and generate stylistically consistent variations ready for direct use in video timelines. That eliminates the need to bounce through third-party tools. Google is pairing this with vertical aspect ratio support for TikTok and Shorts, prompt preambles that auto-extend simple queries into cinematic or vlogging styles, and social hooks like QR sharing. Flow starts looking like a full-stack studio with built-in distribution.

— Simon (@tokumin)
8:58 PM • Aug 21, 2025

In Search, AI Mode is now rolling out to 180+ countries, and trades static “AI Overviews” for agentic behaviors that act on intent. For now, these task-execution features remain gated behind the Google AI Ultra subscription in the US.

The Pixel 10 launch shows Google wants Gemini embedded directly into devices. The Tensor G5 chip runs Gemini Nano entirely on-device, eliminating latency and cloud dependency. That means instant context-sensitive features like Magic Cue, which pulls from Gmail and Calendar before a query is even typed, or live 11-language voice translation without an internet connection. Pixel’s camera stack now includes “Camera Coach” for real-time framing guidance and 100× Super Res zoom stitched by AI inference.

Even Google Photos is now conversational: users can simply ask for edits, from cleanup to creative background swaps, while C2PA Content Credentials, IPTC data, and SynthID tags signal when AI is involved.

Health is also a core theme. Fitbit has been rebuilt as a Gemini-powered coach that adapts to your recovery, sleep, or travel schedule. It doesn’t just spit out metrics - it explains, adjusts, and plans. Gemini Live expands conversational AI with visual annotations, while Gemini for Home replaces Google Assistant on Nest, for a more serious living room play.

Google is collapsing workflows across media, search, and devices. Flow creates, Search executes, Pixel embeds - and Gemini ties it all together. Read more.

Elevenlabs releases its v3 model with new expression controls and support for unlimited speakers

ElevenLabs has rolled out Eleven v3 (alpha), a major upgrade to its text-to-speech API that makes AI voices more expressive and versatile. The model now supports unlimited speakers in dialog mode, letting developers build multi-character conversations without workarounds. It also adds fine-grained audio controls with emotion, pitch, and style so voices can laugh, whisper, or convey subtle nuance naturally.

Eleven v3 covers over 70 languages, expanding global reach, and is accessible via a free API account, with some advanced features behind a paywall. Documentation provides full examples for integrating expressive speech into apps, media, or virtual assistants. Read more.

Chinese unicorn Z.ai unifies mobile and desktop automation with next-gen AI agents

Z.ai is taking agentic AI to the next level, merging mobile and desktop automation into an ecosystem. Its smartphone agent, powered by GLM-Z1-Air and GLM-4-Air-0414, can autonomously handle complex multi-step tasks without forcing users to switch apps. The cloud-backed AutoGLM engine interprets natural-language instructions in real time, supporting “life assistant” and “office assistant” modes and giving users the ability to hand over devices to the agent for full workflow execution.

Introducing ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully.
arxiv.org/abs/2508.14040
ComputerRL features the API-GUI paradigm, which unifies programmatic API calls and direct GUI interaction to
— Z.ai (@Zai_org)
2:31 PM • Aug 20, 2025

On the desktop, Z.ai’s ComputerRL framework bridges the gap between programmatic reasoning and human-centric interfaces. By combining API calls with direct GUI interactions, AutoGLM-OS-9B, built on GLM-4-9B-0414, achieves SoTA accuracy on the OSWorld benchmark, excelling in multi-step reasoning, tool use, and general-purpose automation. Z.ai is positioning itself as one of the top global competitors in human-aligned AI with $1.4 billion in funding and an IPO targeted for 2026. Read more.

5 new AI-powered tools from around the web

Enjo

Enjo learns from your company knowledge base, past conversations in chat and other knowledge sources to efficiently answer support requests directly in platforms like Jira, Slack and more.

www.enjo.ai

Broxi AI

Create powerful AI agents without coding. Broxi's no-code platform lets you build, deploy, and manage intelligent agents for business automation, customer support, and workflow optimization.

broxi.ai

ChartDB v2

Free and open-source database diagrams editor, visualize and design your database with a single query. ChartDB v2 brings real-time collaboration, custom embeddable views, huge-schema performance, and an open-source core for self-hosting or extending.

chartdb.io

ReadyBase

Transform your intelligence from raw to ready with ReadyBase. The final output of any intelligence workflow - turning AI-generated outputs, data, research, and insights into polished, presentation-ready documents.

readybase.ai

Disco.dev

Plug-and-Play Open Source MCP Servers. Connect new tools to AI agents via MCP servers. No coding required. Browse integrations, connect to Claude, VSCode, and other AI clients.

disco.dev