- AI Breakfast
- Posts
- China unveils LLaVA-o1 to challenge OpenAI's o1 model
China unveils LLaVA-o1 to challenge OpenAI's o1 model
Good morning. It’s Friday, November 29th.
Did you know: That xAI Could Soon Have Its Own App?
In today’s email:
Amazon’s Olympus AI
Google’s GenChess
China’s Llava-o1
ElevenLabs GenFM
Optimus Robot Update
5 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think by replying to this email.
Today’s trending AI news stories
Amazon Develops Olympus AI, Undercutting Dependence on Anthropic
Amazon is reportedly gearing up to showcase its Olympus AI model at AWS re:Invent. Designed as a multimodal large language model (LLM), Olympus can parse images, video, and text, enabling users to pinpoint, say, a pivotal basketball play through a simple prompt.
With its foray into generative AI, Amazon seems intent on lessening reliance on Anthropic’s Claude, following its substantial backing of the startup. This move signals Amazon’s recalibration in the AI arms race, where it’s often framed as playing catch-up to Google and Microsoft.
A key player in this effort is AWS' Annapurna Labs in Austin, a hub for developing AI chips like Trainium and Graviton. By closely integrating hardware and software teams, the lab accelerates development and prototyping in a collaborative environment. Its work ranges from creating energy-efficient chips to refining full-stack server systems. Read more.
Google’s Latest AI Experiment Turns Chess into a Creative Playground
Google's experimental arm, Google Labs, has launched GenChess, a web-based game integrating AI-driven image generation through Gemini Imagen 3. Players can customize their chess pieces by inputting text prompts, choosing between a traditional or abstract design.
Our latest Google Labs Experiment is here! #GenChess turns your ideas into playable art pieces using Google’s Imagen 3 model. Create and play today → labs.google/genchess
— labs.google (@labsdotgoogle)
4:31 PM • Nov 26, 2024
Once the set is generated, users can fine-tune individual pieces to their preference. After crafting their ideal set, players can compete against a bot across three difficulty levels. This project highlights the synergy between AI, design, and gaming.
Additionally, Google’s collaboration with FIDE introduces coding challenges for AI chess engines, and the upcoming Chess Gem feature will allow users to play against a Gemini language model, though access will be limited to Gemini Advanced subscribers. Read more.
Chinese researchers unveil LLaVA-o1 to challenge OpenAI's o1 model
LLaVA-o1, developed by Chinese researchers, introduces a structured approach to vision-language models (VLMs) for improved multimodal reasoning, inspired by OpenAI's o1 model. It utilizes a four-stage reasoning process: Summary, Caption, Reasoning, and Conclusion, ensuring logical flow by independently managing each stage.
This method ensures that the model maintains control over its logical flow, sidestepping the common errors of earlier VLMs. LLaVA-o1 also debuts a "stage-level beam search," refining inference-time scaling by generating multiple output candidates at each stage and selecting the best fit.
Trained on a curated dataset of 100,000 image-question pairs annotated by GPT-4o, it’s already outperforming both open-source and some closed-source models, showing a 6.9% increase in benchmark scores. The model’s success sets a new bar for multimodal reasoning, signaling a future where structured logic could redefine VLMs. Read more.
ElevenLabs Launches GenFM to Convert Text into AI-Generated Audio
ElevenLabs has upgraded its ElevenReader app, now integrating GenFM to generate personalized podcasts from a variety of text sources, including PDFs, articles, and ebooks. This feature, available on iOS, employs AI co-hosts in 32 languages to produce dynamic, contextually relevant podcasts.
Utilizing ElevenLabs' advanced AI audio models, GenFM curates detailed summaries, insightful book reviews, and study material explanations, offering users the ability to consume information while multitasking—ideal for commutes or workouts.
The app’s enhanced capabilities transform static text into engaging audio, supporting diverse learning and productivity needs. Android support for GenFM is forthcoming, further extending the app's reach. Read more.
Tesla Optimus Gets a New Hand with 22 Degrees of Freedom
Tesla has upgraded its Optimus humanoid robot with a redesigned hand, now featuring 22 degrees of freedom and an additional three in the forearm. The hand is coated with a soft, protective layer that preserves its tactile sensing abilities while enabling it to handle delicate objects with precision. All actuators are now embedded within the forearm, streamlining its design.
Our new hand is much closer to a human hand capability. It's even faster and has much more degrees of freedom which will allow us to do many more tasks such as catching a ball which was almost impossible with the previous hand. The form factor is great too, so many motors that… x.com/i/web/status/1…
— Julian Ibarz (@julianibarz)
3:52 PM • Nov 28, 2024
Tesla aims to complete the integration of tactile sensors, implement tendon-based fine control, and reduce the forearm's weight by year-end. This enhanced hand design will be standard across all future Optimus robots. Read more.
Record-breaking diamond storage can save data for millions of years
Microsoft's TinyTroupe library creates AI-powered virtual focus groups for product testing
Google DeepMind CAT4D Advances 4D Scene Creation with Multi-View Diffusion
Large language models surpass human experts in predicting neuroscience results
Microsoft AI Launches LazyGraphRAG: Graph-Enabled RAG Without Pre-Summarization
Meta Launches Sparsh, First General-Purpose Encoder for Vision-Based Tactile Sensing
Andrew Ng’s Team Tackles Gen AI Fragmentation with aisuite Python Library
When asked to build web pages, LLMs found to include manipulative design practices
AI can analyze a decomposing body to help pinpoint the time of death
5 new AI-powered tools from around the web
arXiv is a free online library where researchers share pre-publication papers.
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on X!