- AI Breakfast
- Posts
- Google's AI Video Gen Destroys 'Sora'
Google's AI Video Gen Destroys 'Sora'
Good morning. It’s Wednesday, December 18th.
Did you know: Most people still haven’t heard of ElevenLabs Conversational AI Agents?
In today’s email:
Google’s Veo 2 Video Generator
Day 8 & 9 of OpenAI’s Releases
“Whisk” From Google
NVIDIA’s $249 Supercomputer
3 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think by replying to this email.
Today’s trending AI news stories
Google's Veo 2 edges out OpenAI's Sora Turbo in AI video generation tests
Google’s Veo 2 edges out OpenAI’s Sora Turbo in AI video generation benchmarks. Capable of producing 4K-resolution videos, Veo 2 responds to intricate filmmaking prompts, including lens specifications and camera effects, while mitigating common AI flaws like visual “hallucinations” and unrealistic physics.
Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts. 🎥
We’re also releasing an improved version of our text-to-image model, Imagen 3 - available to use in ImageFX through… x.com/i/web/status/1…
— Google DeepMind (@GoogleDeepMind)
5:03 PM • Dec 16, 2024
Benchmarked against Meta’s MovieGenBench dataset, with human evaluators assessing 720p, eight-second clips, Veo 2 outperformed its competitors in both quality and prompt precision. However, Google concedes the model remains challenged by intricate motion dynamics and complex scene composition.
Initial deployments are confined to VideoFX, YouTube, and Vertex AI, with expansion to YouTube Shorts in 2025, embedding SynthID watermarks to mark AI-generated content.
Meanwhile, Imagen 3 pushes the envelope for AI image generation, serving up vibrant colour balance, precise texturing, and stylistic versatility. Imagen 3 and Veo 2 will be available via ImageFX and VideoFX, with API and Google AI Studio access rolling out early next year. Read more.
Day 8 and 9 of OpenAI’s 12-Day Rollout Present Free ChatGPT Search and Premium o1 Model for Select Devs
Day 8 of OpenAI’s 12-Day rollout lifts the paywall for ChatGPT’s search, unlocking real-time, web-sourced results for all registered users. The update prioritizes speed and reliability, especially on mobile, and adds integrated maps, voice search, and the option to set ChatGPT as the default search engine. Results now combine text, visuals, videos, and interactive maps, with demos showcasing practical uses like finding local events, planning trips, and choosing restaurants.
Day 9 rolls out the full o1 reasoning model, but with access limited to Tier 5 developers—those with at least one month of account history and a $1,000 monthly spend. This premium tool, priced at $15 per 750k words analyzed and $60 per 750k words generated, brings advanced capabilities. Key features include a “reasoning_effort” parameter for custom processing depth, function calling, image analysis, and 60% fewer reasoning tokens for reduced latency.
The Realtime API now supports WebRTC for low-latency vocal AI with noise suppression and dynamic congestion control. Developers also gain “direct preference optimization,” a more intuitive fine-tuning method that ranks outputs over predefined input/output pairs.
Google’s new AI Tool Uses Images As Prompts
Google Labs has dropped Whisk, a generative AI tool that reimagines image creation by focusing on visual inputs instead of the usual text prompts. Using Google’s Gemini 2.0 Flash model, Whisk generates detailed descriptions of your images, which are then fed into Imagen 3 to capture the subject’s essence without exact replication.
Meet Whisk! 🎉 Our new experiment that lets you use images as prompts to visualize your ideas and tell your story. Try it now: labs.google/whisk
— labs.google (@labsdotgoogle)
5:26 PM • Dec 16, 2024
The tool is designed for rapid creative exploration, letting users experiment with subjects, scenes, and styles—ideal for brainstorming sessions rather than pixel-perfect results. Early testers, especially in creative fields, find it a refreshing change from standard editing tools.
Users outside the US are locked out for now, and delays in image generation have been reported, likely due to the influx of users. Currently available in Google Labs, Whisk could be the precursor to something bigger. Read more.
Nvidia launches most affordable generative AI supercomputer at $249
Nvidia has launched the Jetson Orin Nano Super Developer Kit, a compact yet formidable AI supercomputer, now available for $249, reduced from $499. This iteration delivers a 1.7x performance increase over its predecessor, boasting a 70% performance gain, achieving 67 INT8 TOPS, and enhanced memory bandwidth of 102 GB/s.
With Nvidia's Ampere GPU and Arm CPU, the system supports concurrent AI pipelines for demanding applications, such as generative AI, robotics, and computer vision. The kit integrates with Nvidia’s comprehensive AI software suite, offering tools for vision AI and edge computing. Existing Jetson Orin Nano users will receive software updates to unlock these advancements without the need for new hardware. Read more.
Google DeepMind launches new AI fact-checking benchmark with Gemini in the lead
MidJourney Adds ‘Moodboards’ and Support for Multiple Custom AI image models
Meta Upgrades its Smart Glasses with AI-powered Real-Time Video Capabilities
New AI Chip Export Rules Could Give Microsoft and Google Gatekeeping Power
Nearly one in four US workers use generative AI on a weekly basis, often without clear rules
YouTube will now let creators opt in to third-party AI training
Salesforce to launch Agentforce 2.0 to change digital labor for enterprise
Meta's new Apollo models aim to crack the video understanding problem
Figure 02: Coffee-making humanoid robot now lands a paid job to brew success at factories
Boltz-1 AI: MIT's open-source model speeds up drug discovery, biomedical research
Google upgrades its programming agent Code Assist with Gemini 2.0, adds source integrations
Writer’s new AI model aims to fix the 'sameness problem' in generative content
Watch: OpenAI DevDay 2024 | Virtual AMA with Sam Altman, moderated by Harry Stebbings
5 new AI-powered tools from around the web
arXiv is a free online library where researchers share pre-publication papers.
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email or DM us on X!