AI Breakfast
Posts
Gemini 3 Leak?

Gemini 3 Leak?

AI Breakfast
October 13, 2025

Good morning. It’s Monday, October 13th.

On this day in tech history: In 2010, Google announced its acquisition of BlindType, a tiny startup working on machine-learning–based text-entry prediction for touchscreens. The tech became part of what evolved into Gboard’s autocorrection and swipe-typing models. Instead of forcing users to tap perfectly on tiny keys, BlindType’s system guessed the intended letters based on patterns and context, even if the touches were way off.

In today’s email:

Gemini 3 leak
OpenAI Takes Heat
Anthropic’s battle with poison data
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

^{In partnership with Wall Street Prep}

8 Weeks. Actionable AI Skills. MBA-Style Networking.

The 8-week AI for Business & Finance Certificate Program helps you:

Build AI confidence with role-specific use cases
Learn how leaders are implementing AI strategies at top financial firms
Secure a lasting network that supports your career growth

Earn your certificate from Columbia Business School Executive Education—program starts November 10.

Enroll by Oct. 13 to get $200 off tuition + use code AIBREAKFAST for an additional $300 off.

^{Thank you for supporting our sponsors!}

Today’s trending AI news stories

Gemini 3 leak gains steam with caveats, as DeepMind drops ‘Vibe Checker’

A leaked memo circling Reddit and 𝕏 now pegs Gemini 3 for an October 22 launch, bumping past the earlier October 9 rumor. The doc claims upgrades in multimodal reasoning, latency, inference cost, and even original music generation. Early testers say it is already edging out Gemini 2.5 and Anthropic’s Sonnet 4.5 on coding and SVG work. The drop may also bundle Veo 3.1 and a Nano Banana variant of Gemini 3 Pro. On the interface side, Google is testing a “My Stuff” asset hub, browser-level Agent Mode, and a refreshed take on “connected apps.” None of it is confirmed, so take it with a grain of salt.

Gemini 3 can compose original music!
listen to it here - audio on ofc 🔊
— ʟᴇɢɪᴛ (@legit_api)
3:29 PM • Oct 11, 2025

Meanwhile, DeepMind and several U.S. researchers are pushing back on pass@k benchmarks that only verify whether code runs and ignore what developers actually scrutinize: style, docstrings, API limits, and error handling. Their response is Vibe Checker, powered by VeriCode, a curated set of 30 rule types pulled from more than 800 Ruff linter checks and paired with deterministic verifiers. They also turned BigCodeBench and LiveCodeBench into BigVibeBench and LiveVibeBench, now covering more than 2,100 tasks.

Both methods test for functional correctness and instruction following. | Image: Zhong et al.

When they ran 31 leading models, pass@1 scores fell about 6 percent with only five added instructions. Once three or more constraints were introduced, none of the models broke 50 percent. Comparing results with more than 800,000 human ratings confirmed what no benchmark had quantified until now: accuracy plus instruction-following tracks real developer preference far better than any single metric in circulation.

Image: Google DeepMind

Google is also touting a new flex: 1.3 quadrillion tokens processed per month. The total is up 320 trillion since June, but the spike traces back to heavier models, not user growth. Gemini 2.5 Flash alone consumes 17 times more tokens per query and can cost 150 times more to run. Even a trivial prompt can spin into multiple reasoning passes, and multimodal inference inflates the count further.

The number signals infrastructure strain, not adoption. It also clashes with Google’s sustainability pitch of 0.24 watt-hours per Gemini request, a figure that only fits lightweight text use and ignores video workloads, agent chaining, and long-context reasoning. Impressive on a slide, less so on a utility bill. Read more.

OpenAI slammed on copyright, subpoenas, and bias testing simultaneously

OpenAI is getting squeezed from multiple fronts. In New York, the company is staring down a potential multibillion-dollar copyright fight after authors and publishers uncovered internal emails about scrubbing a dataset packed with pirated books. They’re now asking the court to force disclosure of OpenAI’s legal communications, arguing the company knew what it was doing and may have destroyed evidence. If a judge agrees, damages could explode past a billion dollars, especially after Anthropic already paid $1.5 billion to make a similar lawsuit go away. Insurers are reportedly balking at underwriting either company.

One Tuesday night, as my wife and I sat down for dinner, a sheriff’s deputy knocked on the door to serve me a subpoena from OpenAI.
I held back on talking about it because I didn't want to distract from SB 53, but Newsom just signed the bill so... here's what happened:
🧵
— Nathan Calvin (@_NathanCalvin)
2:00 PM • Oct 10, 2025

At the same time, OpenAI is alienating the very policy advocates it claims to collaborate with. Encode and The Midas Project, both tiny nonprofits that backed California’s new AI transparency law, SB 53, say OpenAI sent sheriffs to serve subpoenas demanding their private emails with lawmakers, journalists, students, and former OpenAI staff. The company insists it’s all part of its lawsuit against Elon Musk and aimed at sniffing out undisclosed backing. Both groups say they’ve never taken Musk’s money and view the move as legal intimidation timed to ongoing reviews of OpenAI’s $500 billion reorganization.

I, too, made the mistake of *checks notes* taking OpenAI's charitable mission seriously and literally.
In return, got a knock at my door in Oklahoma with a demand for every text/email/document that, in the "broadest sense permitted," relates to OpenAI's governance and investors.
— Tyler Johnston (@TylerJnstn)
4:28 PM • Oct 10, 2025

To blunt growing distrust, the company is also trying to show progress on AI bias. In a new internal audit, OpenAI stress-tested GPT-4o, OpenAI o3, and the newer GPT-5 models with 500 prompts across hot-button political topics, from immigration to reproductive rights. Another model judged the responses using rules against emotive escalation, one-sided framing, personal opinions, and dismissive language.

OpenAI tested ChatGPT’s objectivity in responding to prompts about divisive topics from varying political perspectives. Image screenshot: OpenAI

The company claims GPT-5 instant and GPT-5 thinking cut biased replies by 30 percent compared to older models and were harder to push off balance with slanted prompts. Most failures still emerged under aggressively liberal framing. Read more.

Anthropic battles poisoned data, branding blitz, and legal bills

In collaboration with the UK AI Security Institute and the Alan Turing Institute, the company showed that just 250 poisoned documents, 0.00016 percent of a training corpus, can reliably backdoor large language models from 600 million to 13 billion parameters. Across 72 models, a trigger word, “SUDO,” caused the model to output gibberish. Fewer samples failed; more offered no additional effect, revealing a threshold effect rather than proportional scaling. While low-risk, the results underscore how even minimal data contamination can silently alter model behavior.

Image: Anthropic

Concurrently, Anthropic is accelerating its consumer push. Its New York “Zero Slop Zone” pop-up, a screen-free newsstand offering coffee, books, and “thinking” caps, drew 5,000 visitors and 10 million social impressions. Access required the Claude app, reinforcing product adoption. This anchors the multimillion-dollar “Keep Thinking” campaign, spanning streaming, sports, and print media. Anthropic, now valued at $18.3 billion, projects $5 billion revenue in 2025, primarily from Claude Code, while launching its strongest code model yet, Claude 4.5 Sonnet.

Both Anthropic and OpenAI are now confronting escalating legal and financial exposure. Insurers are shying away from AI-related coverage, forcing firms to consider using investor funds as self-insurance. OpenAI has $300 million in coverage, far short of potential multibillion-dollar liabilities, while Anthropic has already tapped internal capital for a $1.5 billion settlement. These developments reveal a new reality. As AI scales, the industry faces inseparable technical, legal, and financial pressures that test both innovation and resilience. Read more.

Quash

AI platform that converts requirements into runnable visual tests, executes them like a real user with no scripts or selectors, reports failures with context, and catches issues pre-production in one end-to-end workspace.

quashbugs.com

Meku

A web app and site builder engineered for developers. Generate, customize and deploy full-stack web apps and sites from simple AI prompts. Comes with essential integrations and deployment tools and hosting to launch your MVP fast.

meku.dev

Intryc

Intryc scores and gives feedback on real support conversations. Now with AI simulations, agents can role play real past tickets and get scored instantly, cutting onboarding time by 50% and saving CX leads 12 to 15 hours of manual training each week.

www.intryc.com

OpenLIT

Elevate APM with OpenLIT, the open-source platform built on OpenTelemetry. Simplify observability with unified traces and metrics in one powerful interface.

openlit.io

Androidify

The classic Androidify app is back, now powered by Gemini. Turn your selfie or a simple prompt into a unique Android bot avatar. Use AI to generate custom backgrounds with image editing, create shareable stickers, and bring your bot to life.

androidify.com