OpenAI rewrites AI infrastructure playbook

In partnership with

Good morning. It’s Wednesday, September 24th.

On this day in tech history: In 1980, Harold Cohen presented AARON, his long-running generative art system. Instead of imitating scenes, AARON used rule sets, figure-ground separation, and its own visual building blocks. It became an early example of AI probing how structured rules could generate drawings that felt intentional and human-like.

In today’s email:

  • OpenAI rewrites AI infrastructure playbook

  • Alibaba’s Qwen3 stack: trillion-scale models meet sub-second multimodal AI

  • Gemini Live API brings real-time reliability; Play and Photos go conversational

  • Microsoft is killing tech debt, scaling Windows ML for devs

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Looking for unbiased, fact-based news? Join 1440 today.

Join over 4 million Americans who start their day with 1440 – your daily digest for unbiased, fact-centric news. From politics to sports, we cover it all by analyzing over 100 sources. Our concise, 5-minute read lands in your inbox each morning at no cost. Experience news without the noise; let 1440 help you make up your own mind. Sign up now and invite your friends and family to be part of the informed.

Today’s trending AI news stories

Oracle to build, Nvidia to lease: OpenAI rewrites AI infrastructure playbook

OpenAI, Oracle, SoftBank, and Nvidia are fusing money, hardware, and energy into what could be the most ambitious AI infrastructure project to date. Five new U.S. data centers are in the pipeline: Texas, New Mexico, Ohio, plus another Midwest site, bringing total planned capacity to nearly 7 gigawatts, about the same demand as seven nuclear reactors. Abilene, Texas, already hosts tens of thousands of Nvidia GPUs, but the next wave will dwarf even the largest hyperscaler campuses.

Construction underway on a Project Stargate AI infrastructure site in Abilene, Texas, in April 2025. | Image: Daniel Cole / Reuters

Oracle is paying for and managing three of the new sites, selling compute back to OpenAI via Oracle Cloud. SoftBank is backing “fast-build” gigawatt-scale campuses. Nvidia’s $100 billion deal introduces a new model - chip leasing. Instead of OpenAI buying millions of GPUs outright, Nvidia provides hardware under a usage-based structure, turning capital expense into cloud-style economics.

(L to R): OpenAI President Greg Brockman, NVIDIA Founder and CEO Jensen Huang, and OpenAI CEO Sam Altman | Image: Nvidia

The closed loop has Nvidia taking non-voting equity while OpenAI commits spend back into 4–5 million GPUs, targeting 10 gigawatts of compute. CEO Jensen Huang insists this won’t squeeze supply for other customers, though the scale could redefine how hyperscalers finance AI infrastructure.

Sam Altman has framed the effort as “a factory producing a gigawatt of AI infrastructure every week,” describing scaling compute as the literal key to OpenAI’s revenue growth. Read more.

Alibaba’s Qwen3 stack: trillion-scale models meet sub-second multimodal AI 

Alibaba is on an aggressive run of AI rollouts this week. Qwen3-Next is a faster MoE architecture that expands to 512 experts while activating only 10 plus a shared expert per step. With stability fixes like normalized router initialization and attention gating, it delivers over 10x the throughput of Qwen3-32B on long sequences, handling up to 256K tokens natively, with experimental paths to 1M. Two 80B variants lead the line: Instruct for assistants and Thinking for reasoning, with FP8 releases cutting latency and energy overhead.

Qwen3-Omni pushes into full multimodality with 30B parameters and just 3B active per inference. Its split Thinker-Talker system enables streaming speech generation with sub-second response, benchmark wins across 32 of 36 audio/video tasks, and support for 119 written and 19 spoken languages. Open-source Instruct, Thinking, and Captioner variants extend its reach.

On raw scale, Qwen3-Max pushes past 1T parameters and 36T tokens, with ChunkFlow boosting 1M-token context training. Benchmarks put Instruct in the global top three, beating Claude Opus 4 and DeepSeek V3.1 on coding and agent tasks. The Thinking variant, still in training, has already hit perfect scores on AIME 25 and HMMT with test-time scaling and code execution.

Qwen3-LiveTranslate-Flash makes all this tangible by bringing 18-language real-time interpretation at 3s latency, integrating lip-reading, gesture recognition, and semantic unit prediction for near-offline quality. It edges out GPT-4o-Audio-Preview and Gemini-2.5-Flash on speech tasks, while producing expressive dialect-specific voices. Read more.

Gemini Live API brings real-time reliability; Play and Photos go conversational

Google’s upgraded Gemini Live API now runs on a native audio model built for real-time reliability. Function calls, the pipes that let agents pull live data or execute services, are now up to 2x more accurate, even in messy multi-function scenarios. Add tighter audio handling and conversations flow like they should: pauses, side chatter, and interruptions no longer break the thread. Next up is “thinking mode,” where developers set a reasoning budget, trading speed for depth with transparent traces of the model’s process.

On the consumer side, Google is flexing the same tech. A new Gemini overlay in Google Play interprets on-screen context so gamers can ask for hints without breaking flow. A redesigned “You” tab turns the store into a personalized hub for progress, rewards, and cross-app recommendations.

‘You’ can interact with Gemini Live using your voice. | GIF: Google

Google Photos’ conversational editor is expanding beyond Pixel 10, say “remove glare” or “add clouds” and edits happen in seconds, watermarked for provenance.

In Google Labs, Mixboard reimagines mood boards with generative AI, letting users combine images, regenerate styles, and riff on ideas via natural prompts. Read more.

Microsoft is killing tech debt, scaling Windows ML for devs, and cooling chips from the inside out

Microsoft is going after the $85B technical debt problem with autonomous GitHub Copilot agents and new Azure migration tooling. These agents don’t just flag .NET and Java breaking changes, they generate fixes, refactor dependencies, patch security gaps, spin up tests, and repackage workloads into containers. In pilots, Xbox cut migration effort by 88%, while Ford reported a 70% reduction modernizing middleware.

On the client side, Windows ML is now generally available in Windows 11, embedding a production-ready ONNX runtime that automatically routes workloads across CPUs, GPUs, and NPUs via execution providers from AMD, Intel, NVIDIA, and Qualcomm. Adobe, McAfee, and Wondershare are already building on it, running semantic video search, real-time deepfake detection, and other edge workloads.

The company is also now cooling GPUs from the inside out. Its new in-chip microfluidics carves hairline channels directly into silicon, pushing liquid coolant across hotspots. Early tests show a 65% drop in GPU temperature rise and up to 3x efficiency over cold plates. Co-developed with Swiss startup Corintis, the design uses bio-inspired channels modeled after leaf veins, with AI rerouting coolant in real time. Read more.

5 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!