• AI Breakfast
  • Posts
  • China unveils LLaVA-o1 to challenge OpenAI's o1 model

China unveils LLaVA-o1 to challenge OpenAI's o1 model

Good morning. It’s Friday, November 29th.

Did you know: That xAI Could Soon Have Its Own App?

In today’s email:

  • Amazon’s Olympus AI

  • Google’s GenChess

  • China’s Llava-o1

  • ElevenLabs GenFM

  • Optimus Robot Update

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories

Amazon Develops Olympus AI, Undercutting Dependence on Anthropic

Amazon is reportedly gearing up to showcase its Olympus AI model at AWS re:Invent. Designed as a multimodal large language model (LLM), Olympus can parse images, video, and text, enabling users to pinpoint, say, a pivotal basketball play through a simple prompt.

With its foray into generative AI, Amazon seems intent on lessening reliance on Anthropic’s Claude, following its substantial backing of the startup. This move signals Amazon’s recalibration in the AI arms race, where it’s often framed as playing catch-up to Google and Microsoft.

A key player in this effort is AWS' Annapurna Labs in Austin, a hub for developing AI chips like Trainium and Graviton. By closely integrating hardware and software teams, the lab accelerates development and prototyping in a collaborative environment. Its work ranges from creating energy-efficient chips to refining full-stack server systems. Read more.

Google’s Latest AI Experiment Turns Chess into a Creative Playground

Google's experimental arm, Google Labs, has launched GenChess, a web-based game integrating AI-driven image generation through Gemini Imagen 3. Players can customize their chess pieces by inputting text prompts, choosing between a traditional or abstract design.

Once the set is generated, users can fine-tune individual pieces to their preference. After crafting their ideal set, players can compete against a bot across three difficulty levels. This project highlights the synergy between AI, design, and gaming.

Additionally, Google’s collaboration with FIDE introduces coding challenges for AI chess engines, and the upcoming Chess Gem feature will allow users to play against a Gemini language model, though access will be limited to Gemini Advanced subscribers. Read more.

Chinese researchers unveil LLaVA-o1 to challenge OpenAI's o1 model

LLaVA-o1, developed by Chinese researchers, introduces a structured approach to vision-language models (VLMs) for improved multimodal reasoning, inspired by OpenAI's o1 model. It utilizes a four-stage reasoning process: Summary, Caption, Reasoning, and Conclusion, ensuring logical flow by independently managing each stage.

This method ensures that the model maintains control over its logical flow, sidestepping the common errors of earlier VLMs. LLaVA-o1 also debuts a "stage-level beam search," refining inference-time scaling by generating multiple output candidates at each stage and selecting the best fit.

Trained on a curated dataset of 100,000 image-question pairs annotated by GPT-4o, it’s already outperforming both open-source and some closed-source models, showing a 6.9% increase in benchmark scores. The model’s success sets a new bar for multimodal reasoning, signaling a future where structured logic could redefine VLMs. Read more.

ElevenLabs Launches GenFM to Convert Text into AI-Generated Audio

ElevenLabs has upgraded its ElevenReader app, now integrating GenFM to generate personalized podcasts from a variety of text sources, including PDFs, articles, and ebooks. This feature, available on iOS, employs AI co-hosts in 32 languages to produce dynamic, contextually relevant podcasts.

Utilizing ElevenLabs' advanced AI audio models, GenFM curates detailed summaries, insightful book reviews, and study material explanations, offering users the ability to consume information while multitasking—ideal for commutes or workouts.

The app’s enhanced capabilities transform static text into engaging audio, supporting diverse learning and productivity needs. Android support for GenFM is forthcoming, further extending the app's reach. Read more.

Tesla Optimus Gets a New Hand with 22 Degrees of Freedom

Tesla has upgraded its Optimus humanoid robot with a redesigned hand, now featuring 22 degrees of freedom and an additional three in the forearm. The hand is coated with a soft, protective layer that preserves its tactile sensing abilities while enabling it to handle delicate objects with precision. All actuators are now embedded within the forearm, streamlining its design.

Tesla aims to complete the integration of tactile sensors, implement tendon-based fine control, and reduce the forearm's weight by year-end. This enhanced hand design will be standard across all future Optimus robots. Read more.

5 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on X!