• AI Breakfast
  • Posts
  • This open-source model tops GPT-5, Claude Sonnet 4.5

This open-source model tops GPT-5, Claude Sonnet 4.5

In partnership with

Good morning. It’s Friday, November 7th.

On this day in tech history: In 2004, Cyc reached its 20th year. Created in 1984, it set out to encode common-sense reasoning in symbolic logic long before neural networks dominated AI. Though niche, Cyc influenced knowledge graphs, the semantic web, and neuro-symbolic research, revealing how difficult it is to formalize human common sense.

In today’s email:

  • Kimi K2 Model Beats GPT-5 and Sonnet 4.5

  • OpenAI hits 1M enterprise users

  • Google drops AlphaEvolve, Ironwood, and Gemini Tools

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

How can AI power your income?

Ready to transform artificial intelligence from a buzzword into your personal revenue generator

HubSpot’s groundbreaking guide "200+ AI-Powered Income Ideas" is your gateway to financial innovation in the digital age.

Inside you'll discover:

  • A curated collection of 200+ profitable opportunities spanning content creation, e-commerce, gaming, and emerging digital markets—each vetted for real-world potential

  • Step-by-step implementation guides designed for beginners, making AI accessible regardless of your technical background

  • Cutting-edge strategies aligned with current market trends, ensuring your ventures stay ahead of the curve

Download your guide today and unlock a future where artificial intelligence powers your success. Your next income stream is waiting.

Today’s trending AI news stories

“Kimi K2” Thinking tops GPT-5, Claude Sonnet 4.5 in open-source AI benchmarks

Moonshot AI just shook up the AI landscape with Kimi K2 Thinking. This open-weight model now beats GPT-5, Claude Sonnet 4.5, and MiniMax-M2 on reasoning, coding, and agentic benchmarks. It’s a trillion-parameter Mixture-of-Experts system, activating 32 billion parameters per inference, running up to 200–300 sequential tool calls, and handling 256k-token context windows with INT4 quantization, meaning it can sustain long multi-step workflows like autonomous coding loops or structured analysis without breaking a sweat.

Benchmark highlights: 60.2% on BrowseComp, 71.3% on SWE-Bench Verified, 83.1% on LiveCodeBench v6, and 56.3% on Seal-0. K2 Thinking outputs reasoning traces, giving full transparency for multi-step, agentic reasoning. Runtime costs are tiny with $0.15 per million tokens for cache hits, far cheaper than GPT-5. Released under a modified MIT license, it’s fully open for research and commercial use, with a light-touch attribution for high-scale deployments. Read more.

OpenAI hits 1M enterprise users as Altman says “market, not government” will decide

OpenAI just cleared a major milestone with over 1 million paying business customers, making it the fastest-growing enterprise AI platform in history. Adoption is fueled by the 800M weekly ChatGPT users, shortening pilot cycles and speeding org-wide rollouts. ChatGPT for Work now sits at 7M seats (+40% in two months), and ChatGPT Enterprise is up 9× year-over-year.

Users can now interrupt long-running queries and inject new context mid-stream, a subtle but big upgrade for deep research and GPT-5 Pro workflows. The company continues to expand globally with a 50/50 joint venture with SoftBank will deploy a localized enterprise stack, “Crystal intelligence,” with SoftBank as the launch customer.

Altman projects $20B+ in annual revenue for 2025, climbing toward hundreds of billions by 2030. OpenAI is committing $1.4T over eight years to expand AI Cloud, enterprise tools, consumer devices, and robotics. No government bailouts will be requested. Any public support, Altman says, should fund broad AI infrastructure or chip production for national security. Current capacity limits are already delaying new features, particularly in scientific and medical workflows.

Image: OpenAI

On consumer traction, Sora launched on Android with ~470K installs on day one across seven markets, four times the adjusted iOS debut. Plus, new ChatGPT apps, including Peloton and Tripadvisor, expand the platform’s ecosystem. Read more.

Google drops AlphaEvolve, Ironwood, and Gemini Tools across AI stack 

Google rolls out DeepMind’s AlphaEvolve, an AI system attacking classical and unsolved math problems. It blends evolutionary coding with LLMs, iteratively generating, testing, and refining solutions. Gemini Deep Think verifies outputs, while AlphaProof formalizes proofs in Lean. Across 67 combinatorics, geometry, and number theory challenges, AlphaEvolve rediscovered known results and occasionally produced formulas that outperformed human solutions. Mathematicians including Terence Tao validated outputs, proving hybrid human-AI discovery isn’t just hype.

Image: Google

Google’s seventh-generation Ironwood TPU goes live for large-scale AI training and inference. Expect 4× gains for training and inference, with superpods scaling to 9,216 chips, 9.6 Tbps interconnects, and 1.77 PB high-bandwidth memory. Complementing Ironwood, Arm-based Axion VMs handle preprocessing and pipelines with up to 2× better price-performance versus x86. Early adopters like Anthropic committed billions for up to a million TPUs, ensuring low-latency, enterprise-scale inference.

Opal, Google’s no-code AI mini-app builder, is now in 160+ countries, letting users automate workflows, generate content, and prototype apps in minutes. File Search within the Gemini API simplifies RAG pipelines, automatically managing storage, chunking, embeddings, and context injection. Early adopters like Phaser Studio cut multi-hour research to seconds using semantic vector search with built-in citations.

Maps becomes Gemini’s “all-knowing copilot,” turning navigation into conversational, task-driven guidance with Street View analysis across 250 million places. Multi-step voice workflows cover stops, parking, and Calendar events, while Lens adds on-device scene understanding.

Gemini Deep Research now blends Gmail, Drive, and Chat with public sources, outputting to Docs or audio. Chrome adds AI Mode for conversational queries and actions. Vertex AI Agent Builder now supports agent creation in under 100 lines of code, advanced context memory, self-healing plugins, Go language, and governance dashboards including Model Armor and Security Command Center. Consistency training on Gemma and Gemini 2.5 Flash improves robustness, while nine legacy models retire on November 18.

Image: Google

Google Finance integrates Gemini Deep Research with prediction market data from Kalshi and Polymarket, plus AI-powered earnings insights. Read more.

OpenEnv also links to major RL ecosystems including TorchForge, verl, TRL, and SkyRL, supporting composable, scalable agent development. Meta and Hugging Face are inviting RFC feedback and contributions, positioning OpenEnv as a standard framework for safe, production-ready autonomous AI workflows. Read more.

5 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!