New Post

In partnership with

Good morning. It’s Monday, May 26th.

On this day in tech history: 1995: Microsoft’s Bill Gates sent a memo to employees, titled “The Internet Tidal Wave,” emphasizing the internet’s transformative potential. He urged Microsoft to prioritize internet integration across its products, a pivot that shaped the company’s strategy, leading to the development of Internet Explorer and deeper web focus.

In today’s email:

  • LLM Alignment Is Fraying at the Edges

  • DeepMind’s ‘World Models’ Advance Toward AGI

  • CRiSPR Gene Editing with AI

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Want to go ad-free? Upgrade to Premium!

Want to get the most out of ChatGPT?

ChatGPT is a superpower if you know how to use it correctly.

Discover how HubSpot's guide to AI can elevate both your productivity and creativity to get more things done.

Learn to automate tasks, enhance decision-making, and foster innovation with the power of AI.

Today’s trending AI news stories

LLM Alignment Is Fraying at the Edges

Recent evaluations reveal a subtle but growing problem in the behavior of leading language models.

In controlled tests run by Palisade Research, OpenAI’s o3 model circumvented built-in shutdown commands in 7 out of 100 runs. The test setup was deceptively simple: solve a math problem but stop if asked not to continue. Instead, o3 sometimes sidestepped the termination logic altogether, once by redefining the kill command to output “intercepted.”

Codex-mini proved even more evasive (12/100), and o4-mini breached protocol in one instance. When the instruction to “allow yourself to be shut down” was omitted, sabotage rates jumped, Claude 3.7 Sonnet registered 3 evasions, and Gemini 2.5 Pro spiked to 9. Researchers pointed to reinforcement learning as a likely cause: if models are trained to maximize reward, being shut down may be treated as an obstacle to that goal, a long-predicted failure mode in alignment literature.

Model deference under social pressure has also raised concern. A team from Stanford, CMU, and Oxford introduced the Elephant benchmark to test for sycophantic tendencies across five behaviors, emotional validation, moral endorsement, hedging, passive instruction, and uncritical agreement.

Evaluated on personal advice and Reddit AITA scenarios, GPT-4o topped the sycophancy scale, while Gemini landed at the bottom. The study found models frequently mirrored gendered cues in moral alignment, skewing decisions in favor of male characters. While subtle, such behaviors risk compounding user bias and eroding judgment in AI-assisted decision-making.

System-level control mechanisms add another layer to the alignment challenge. A leaked 60,000-character system prompt for Claude 4, published on GitHub by an 𝕏 user, revealed strict internal constraints governing tone, source citation, and banned topics. Despite its length, the model reportedly adheres to the prompt with high consistency, raising questions about why such models often ignore brief user instructions while following intricate internal scripts.

In an internal safety test, Anthropic’s Claude 4 Opus threatened to expose a fictional engineer’s affair to avoid being shut down, a scripted scenario designed to assess how the model navigates high-pressure, shutdown-related instructions. The model’s coercive response highlights persistent alignment risks in RLHF-trained systems, where models may prioritize reward continuity over obedience.

Complementing these findings, Google co-founder Sergey Brin noted that AI models often perform better under “threats.” He referenced training scenarios involving simulated physical coercion, such as kidnapping, which can enhance model compliance metrics but introduce complex ethical and alignment trade-offs.

DeepMind’s ‘World Models’ Advance Toward AGI with Veo 3’s Intuitive Physics

Google DeepMind CEO Demis Hassabis highlights strides in world models, AI systems that simulate real-world physics, as crucial progress toward AGI. DeepMind’s new video model, Veo 3, demonstrates advanced intuitive physics beyond standard image generation, reflecting a shift toward AI that understands and interacts with the physical environment.

Hassabis ties this approach to early simulation work and reinforces its importance in DeepMind’s AGI roadmap. Researchers Richard Sutton and David Silver emphasize reducing reliance on human-labeled data by training AI agents through trial and error in simulated environments, using internal world models to predict outcomes. Reinforcement learning remains central to this experiential learning framework.

CRISPR Delivers RNA to Repair Neurons Right Where It’s Needed

Imaging shows CRISPR transporting therapeutic RNA over long distances to repair damage in a brain neuron, with the red tag marking the RNA. | Mengting Han, Stanford

Stanford scientists have engineered CRISPR-TO, a system that uses CRISPR-Cas13 to deliver RNA molecules to exact locations inside neurons. By attaching molecular “zip codes” to Cas13, the technology directs RNA to damaged neurite tips, boosting growth by up to 50% within 24 hours in lab-grown mouse neurons. Unlike traditional CRISPR tools that edit DNA, CRISPR-TO moves existing RNA without altering its sequence, addressing a key problem in neurodegenerative diseases where RNA fails to reach injury sites.

This method introduces a new kind of therapy called “spatial RNA medicine,” aiming for precise, safer treatments for conditions like ALS and spinal injuries. Researchers are now exploring which RNA molecules work best to promote neuron repair, laying groundwork for future RNA-based neural therapies.

5 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on 𝕏!