AI Breakfast
Posts
OpenAI rewrites AI infrastructure playbook

OpenAI rewrites AI infrastructure playbook

AI Breakfast
September 24, 2025

In partnership with

Good morning. It’s Wednesday, September 24th.

On this day in tech history: In 1980, Harold Cohen presented AARON, his long-running generative art system. Instead of imitating scenes, AARON used rule sets, figure-ground separation, and its own visual building blocks. It became an early example of AI probing how structured rules could generate drawings that felt intentional and human-like.

In today’s email:

OpenAI rewrites AI infrastructure playbook
Alibaba’s Qwen3 stack: trillion-scale models meet sub-second multimodal AI
Gemini Live API brings real-time reliability; Play and Photos go conversational
Microsoft is killing tech debt, scaling Windows ML for devs
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Looking for unbiased, fact-based news? Join 1440 today.

Join over 4 million Americans who start their day with 1440 – your daily digest for unbiased, fact-centric news. From politics to sports, we cover it all by analyzing over 100 sources. Our concise, 5-minute read lands in your inbox each morning at no cost. Experience news without the noise; let 1440 help you make up your own mind. Sign up now and invite your friends and family to be part of the informed.

Subscribe to 1440 today.

Today’s trending AI news stories

Oracle to build, Nvidia to lease: OpenAI rewrites AI infrastructure playbook

OpenAI, Oracle, SoftBank, and Nvidia are fusing money, hardware, and energy into what could be the most ambitious AI infrastructure project to date. Five new U.S. data centers are in the pipeline: Texas, New Mexico, Ohio, plus another Midwest site, bringing total planned capacity to nearly 7 gigawatts, about the same demand as seven nuclear reactors. Abilene, Texas, already hosts tens of thousands of Nvidia GPUs, but the next wave will dwarf even the largest hyperscaler campuses.

Construction underway on a Project Stargate AI infrastructure site in Abilene, Texas, in April 2025. | Image: Daniel Cole / Reuters

Oracle is paying for and managing three of the new sites, selling compute back to OpenAI via Oracle Cloud. SoftBank is backing “fast-build” gigawatt-scale campuses. Nvidia’s $100 billion deal introduces a new model - chip leasing. Instead of OpenAI buying millions of GPUs outright, Nvidia provides hardware under a usage-based structure, turning capital expense into cloud-style economics.

(L to R): OpenAI President Greg Brockman, NVIDIA Founder and CEO Jensen Huang, and OpenAI CEO Sam Altman | Image: Nvidia

The closed loop has Nvidia taking non-voting equity while OpenAI commits spend back into 4–5 million GPUs, targeting 10 gigawatts of compute. CEO Jensen Huang insists this won’t squeeze supply for other customers, though the scale could redefine how hyperscalers finance AI infrastructure.

Sam Altman has framed the effort as “a factory producing a gigawatt of AI infrastructure every week,” describing scaling compute as the literal key to OpenAI’s revenue growth. Read more.

Alibaba’s Qwen3 stack: trillion-scale models meet sub-second multimodal AI

Alibaba is on an aggressive run of AI rollouts this week. Qwen3-Next is a faster MoE architecture that expands to 512 experts while activating only 10 plus a shared expert per step. With stability fixes like normalized router initialization and attention gating, it delivers over 10x the throughput of Qwen3-32B on long sequences, handling up to 256K tokens natively, with experimental paths to 1M. Two 80B variants lead the line: Instruct for assistants and Thinking for reasoning, with FP8 releases cutting latency and energy overhead.

Just for fun, here's what 32 simultaneous long-context generations with Qwen3 Next 80B looks like on an M3 Ultra.
Using the new batch generation in mlx-lm.
Context size for each is about 5k tokens:
— Awni Hannun (@awnihannun)
10:38 PM • Sep 22, 2025

Qwen3-Omni pushes into full multimodality with 30B parameters and just 3B active per inference. Its split Thinker-Talker system enables streaming speech generation with sub-second response, benchmark wins across 32 of 36 audio/video tasks, and support for 119 written and 19 spoken languages. Open-source Instruct, Thinking, and Captioner variants extend its reach.

On raw scale, Qwen3-Max pushes past 1T parameters and 36T tokens, with ChunkFlow boosting 1M-token context training. Benchmarks put Instruct in the global top three, beating Claude Opus 4 and DeepSeek V3.1 on coding and agent tasks. The Thinking variant, still in training, has already hit perfect scores on AIME 25 and HMMT with test-time scaling and code execution.

🥸 Many new Qwen models are coming soon, all empowered with enhanced code capabilities.
— Binyuan Hui (@huybery)
2:22 PM • Sep 22, 2025

Qwen3-LiveTranslate-Flash makes all this tangible by bringing 18-language real-time interpretation at 3s latency, integrating lip-reading, gesture recognition, and semantic unit prediction for near-offline quality. It edges out GPT-4o-Audio-Preview and Gemini-2.5-Flash on speech tasks, while producing expressive dialect-specific voices. Read more.

Gemini Live API brings real-time reliability; Play and Photos go conversational

Google’s upgraded Gemini Live API now runs on a native audio model built for real-time reliability. Function calls, the pipes that let agents pull live data or execute services, are now up to 2x more accurate, even in messy multi-function scenarios. Add tighter audio handling and conversations flow like they should: pauses, side chatter, and interruptions no longer break the thread. Next up is “thinking mode,” where developers set a reasoning budget, trading speed for depth with transparent traces of the model’s process.

Introducing our latest Gemini Live model 🔊, built on all the things you love about Gemini, with significantly improved function calling and more natural feeling / sounding conversations (thanks to native audio)!
Try out the new model at ai.studio/live
— Logan Kilpatrick (@OfficialLoganK)
5:50 PM • Sep 23, 2025

On the consumer side, Google is flexing the same tech. A new Gemini overlay in Google Play interprets on-screen context so gamers can ask for hints without breaking flow. A redesigned “You” tab turns the store into a personalized hub for progress, rewards, and cross-app recommendations.

‘You’ can interact with Gemini Live using your voice. | GIF: Google

Google Photos’ conversational editor is expanding beyond Pixel 10, say “remove glare” or “add clouds” and edits happen in seconds, watermarked for provenance.

🚨 NEW LABS EXPERIMENT 🚨
Introducing Mixboard 💡🧑‍🎨 an experimental, AI-powered concepting board. Designed to help you explore, visualize, and refine your ideas and powered by our latest image generation model (🍌)
Now available in US-only public beta! Learn more and try it
— Google Labs (@GoogleLabs)
8:04 PM • Sep 23, 2025

In Google Labs, Mixboard reimagines mood boards with generative AI, letting users combine images, regenerate styles, and riff on ideas via natural prompts. Read more.

Microsoft is killing tech debt, scaling Windows ML for devs, and cooling chips from the inside out

Microsoft is going after the $85B technical debt problem with autonomous GitHub Copilot agents and new Azure migration tooling. These agents don’t just flag .NET and Java breaking changes, they generate fixes, refactor dependencies, patch security gaps, spin up tests, and repackage workloads into containers. In pilots, Xbox cut migration effort by 88%, while Ford reported a 70% reduction modernizing middleware.

On the client side, Windows ML is now generally available in Windows 11, embedding a production-ready ONNX runtime that automatically routes workloads across CPUs, GPUs, and NPUs via execution providers from AMD, Intel, NVIDIA, and Qualcomm. Adobe, McAfee, and Wondershare are already building on it, running semantic video search, real-time deepfake detection, and other edge workloads.

The company is also now cooling GPUs from the inside out. Its new in-chip microfluidics carves hairline channels directly into silicon, pushing liquid coolant across hotspots. Early tests show a 65% drop in GPU temperature rise and up to 3x efficiency over cold plates. Co-developed with Swiss startup Corintis, the design uses bio-inspired channels modeled after leaf veins, with AI rerouting coolant in real time. Read more.

5 new AI-powered tools from around the web

Tenki

Tenki runners give GitHub Actions users faster performance and up to 90% savings with zero maintenance. Builds 30% faster by changing one line.

www.tenki.cloud

Gladia

Gladia offers real-time and async speech-to-text APIs with multilingual support and AI insights, allowing devs to build transcription and conversational intelligence into products with high accuracy and scalability.

www.gladia.io

Snapdeck

Build a winning deck in a snap. No more wrestling with PowerPoint or Keynote. Just type a prompt, and its AI designs layouts, themes, charts, and diagrams for you. Available as a standalone app.

www.snapdeck.app

Lamatic.ai

A managed PaaS with a low-code visual builder, VectorDB, and integrations to apps and models for building, testing, and deploying high-performance GenAI apps on edge.

lamatic.ai

Lookup

Lookup turns raw footage into structured answers, proof clips, and automations with just a few lines of code. From counting people to compliance checks, it makes video searchable and programmable. This will help any app see.

app.fravix.ai/login