AI Breakfast
Posts
Google's AI Announcements You May Have Missed

Google's AI Announcements You May Have Missed

AI Breakfast
May 15, 2024

In partnership with

Good morning. It’s Wednesday, May 15th.

Did you know: On this day in 2001, Apple announced plans to open the first 25 Apple Retail Stores by the end of the year. The Apple Store ended up being the highest grossing retail chain store per square foot in history.

Get our book: Decoding AI | Upgrade to ad-free

In today’s email:

Google’s AI Updates
OpenAI’s GPT-4o
10 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

_{In partnership with}

Hire a world class AI team

Engineers who understand AI are expensive and difficult to find, and it can be hard to figure out who to trust. On top of that, 85% of all AI projects fail.

But AE Studio succeeds.

We listen to your business challenge and help you craft and implement the optimal AI solution with our team of world class AI experts from Harvard, Stanford and Princeton.

Our development, design, and data science teams work closely with founders and executives to create custom software and AI solutions that get the job done. The secret to our success is treating your project as if it were our own startup.

Tell us all about your big AI Idea

Today’s trending AI news stories

Google I/O 2024 Doubles Down on AI

Google's I/O 2024 went all-in on AI, showcasing chatbots with a new spark, search with a sharper mind, and a hefty dose of machine smarts for Android.

Here are 21 announcements from I/O 2024:

Gemini Flash 1.5: Real-Time Chat Champ: Google introduced Gemini Flash 1.5, a new AI model ideal for real-time chats. Faster than Nano but lighter than Pro, Flash handles text, images, videos, and speech. It comp etes with OpenAI's GPT-4o, offering global access for developers to integrate into chat apps and other interactive experiences. This positions Flash as a cost-effective option for developers seeking speed and versatility, potentially fueling a wave of AI-powered chat innovations. Read more.

LearnLM: AI Enters the Classroom: Google introduces LearnLM, an AI model built on Gemini's foundation. Integrated across Search, Android, and YouTube, LearnLM empowers students with diverse learning materials, interactive experiences, and tailored assistance for educational queries. Focused on educational tasks, LearnLM ensures its responses stay relevant. Read more.

TalkBack Gets a Multimodal Makeover: Android's TalkBack now integrates Gemini Nano AI, offering a delightful upgrade for visually impaired users. This iteration expands TalkBack's capabilities beyond text, recognizing images, sounds, and spoken language. TalkBack now uses on-device AI to describe unlabeled images offline, enhancing user experience without an internet connection. Read more.

AR Goggles Resurface in Project Astra: A prototype goggle in Project Astra hints at Google's renewed interest in AR. These chunky spectacles, sporting a possible display and hands-free interaction, suggest an evolution from Project Iris. The curved nose bridge piques curiosity about further refinements. With Meta's Ray-Bans in the market, this revival could ignite a race for wearable AR dominance, leveraging Google's existing software ecosystem. Read more.

AI Jams on Demand: Google's keynote got a jolt with musician Marc Rebillet. He showcased MusicFX's new DJ mode, an AI tool that turns text prompts into music. Simple phrases produced diverse tracks on the fly, while a mixer interface let Rebillet adjust them live. This fusion of vocals and AI improvisation became a lively (and memorable) highlight for developers at I/O. Read more.

AI Studio Opens for Experimentation: Google AI Studio bolsters its offerings with two key features: video frame extraction and context caching. Frame extraction facilitates the isolation of individual video frames, enhancing AI model training on scene understanding. Context caching optimizes resource utilization by storing frequently used context for tasks like content analysis, streamlining workflows and reducing costs. Accessible via Vertex AI Studio, users can experiment with Google's models to see if they fit their projects. Read more.

SynthID Gets Tougher on Fakes: Google unveiled an upgrade to SynthID alongside new AI models. This expansion tackles AI-generated text and video forgeries, reflecting DeepMind's focus on combating misuse. SynthID's sharper tools aim to fight misinformation and unwanted content, aligning with calls for stronger AI safeguards. Read more.

Playful AI Buddies with Gemini Gems: Inspired by Character.AI, Google introduces Gemini Gems. Users can design chatbots like virtual trainers or chefs, defining their skills and personalities. Similar to OpenAI's GPT Store, Gems simplifies creating custom bots. Users simply dictate the bot's specialty and temperament, and Gemini conjures the perfect digital sidekick. Read more.

Android's Gemini Gets Smarter: The Android version of Gemini gets a power-up for multimedia sleuthing. Now it analyzes videos and PDFs (advanced tier only). It identifies videos, prompts questions based on captions, and offers PDF insights. Plus, a drag-and-drop feature lets you integrate Gemini's images into other apps, boosting productivity. Read more.

Chrome Gets a Brainy Boost: Gemini Nano, Google's lightweight AI, lands in Chrome 126. It powers on-device features like text generation. Optimized for speed, Chrome lets you write social media posts or reviews directly within the browser. Plus, Gemini in DevTools offers developers witty error explanations and code-fixing suggestions. Read more.

Android Warns of Scam Calls: A new feature in development uses Gemini Nano to identify suspicious phone calls. It flags red flags like requests for personal info or urgent financial transactions. While details are scarce (release date, wider compatibility), initial rollout might be for Pixel 8 Pro and Samsung S24 users. Read more.

AI Directs Your Vision: Meet Veo Google's I/O unveiled Veo, an AI "auteur" that creates high-quality 1080p videos from text, images, or video prompts. It understands cinematic language and promises polished outputs. Filmmakers get early access, with a wider creator preview planned, including YouTube Shorts integration. This echoes OpenAI's Sora, potentially amping up AI's role in filmmaking. Read more.

Project Astra: Google's AI Eyes the Future. Google throws down the gauntlet to OpenAI's GPT-4o with Project Astra. This research assistant, led by DeepMind's Demis Hassabis, uses video to understand the world. Think explaining code, identifying objects, and witty banter – all hands-free. Cameras and microphones feed Astra's real-time processing, paving the way for integration into future wearables like smart glasses. While no release date is set, potential inclusion in existing tools like Gemini (boasting a 2M-token context window) hints at Google's progress in AI assistants. Read more.

Google's search gets a touch of "AI magic" with "Overviews." Led by Search chief Liz Reid, this feature leverages AI for summaries, video search via Lens, and trip planning tools. While Gemini AI tackles query understanding, answer generation, and result organization, Reid assures it's not all smoke and mirrors. AI aims to enhance complex searches, like local explorations, with an emphasis on informative summaries and data quality. Read more.

Gemini 1.5 Pro Boosts Workspace: Google's supercharged Gemini integrates with Docs, Drive, Gmail, and more. Championed by Aparna Pappu (Workspace VP), it unlocks data across apps. Need info from emails or organize receipts? Gemini tackles it. This AI assistant aims to banish manual tasks and boost efficiency. Features like "Help me write" hint at its productivity potential. Early access is limited, but wider rollout is planned to combat data retrieval and organization woes. Read more.

Talk It Out with Gemini Live (Limited Access) Google offers voice chat with Gemini Live, a premium feature adapting to your speech patterns. Inspired by OpenAI's ChatGPT, it fosters natural conversation with Gemini. Real-time video interpretation unlocks hands-free tasks – update your calendar with an image or access travel info via voice. This multimodal experience aligns with Project Astra and echoes OpenAI's GPT-4o focus. Read more.

Lens Goes Live: Search with Video. Google Lens ditches static images. Now you can leverage video and audio for searches. Car trouble? No need for cryptic descriptions, just film it. This aligns with Google's AI push, simplifying complex queries with video search. Identify a mystery plant or decipher product details with a quick clip. Lens embraces "show, don't tell" for a more intuitive search experience. Read more.

Ask Photos: Your AI-Powered Memory Detective. Google I/O unveiled "Ask Photos," a smarter Photos powered by Gemini AI. Pichai showcased its prowess, finding a license plate and summarizing a child's swimming journey through years of photos. With 6 billion daily uploads, Photos' evolution continues, aiming to be your AI memory detective. Read more.

Firebase Unveils Open-Source AI Toolkit: Democratizing AI for developers, Google debuts Genkit, an open-source framework (JS/TS, Go coming soon) for building AI-powered apps. Seamlessly integrate text, image, and code-related AI features from Google and open-source projects (Ollama, Pinecone). Firebase's serverless tools (Cloud Functions, App Hosting) streamline development and deployment. Project IDX's UI integrates with Genkit for a unified experience. Read more.

Google's Trillium Chip Boosts AI Performance: Google's next-gen AI chip, Trillium, boasts a 4.7x speed increase over its predecessor. This 6th-generation chip targets Nvidia's dominance in data centers, offering 67% higher energy efficiency. Optimized memory tackles performance bottlenecks, enabling scalability for demanding workloads. Read more.

OpenAI unleashes GPT-4o, a new flagship model with real-time multimodal capabilities

OpenAI has introduced GPT-4o, a multimodal orchestration engine in the realm of human-computer interaction. This "omni"-potent model, as its name suggests (derived from the Latin "omnis" for all), effortlessly juggles text, audio, and visual inputs. GPT-4o boasts human-like reaction times, responding to spoken prompts in a mere 320 milliseconds.

GPT-4o, available free to all ChatGPT users, includes a desktop version for macOS, with Windows soon to follow. The model enables real-time voice conversations with emotive responses and visual interpretation, like solving math problems. Its multilingual support spans 50 languages, offering real-time translation.

Developers accessing GPT-4o via API benefit from faster, cheaper performance compared to GPT-4 Turbo. Notably, it excels in challenging tasks like coding, achieving +100 ELO over previous models, as demonstrated in tests on the LMSys arena.

The launch of GPT-4o includes significant upgrades, such as character-consistent image generation, 3D object synthesis, and a 50% reduction in API costs. The update provides limited access to GPT-4o for free users, along with tools for data analysis, file uploads, and web browsing.

In a recent tweet, Tibor Blaho also drops hints on Sidekick, the ChatGPT desktop app. Currently, access is limited to a select group, with broader availability anticipated later this year. Initially exclusive to macOS, plans for expansion to other platforms are in the works.

The Alpha version flaunts a debug mode similar to its web counterpart, boasting Advanced Data Analysis v2 and nifty search features. But the voice mode showcased in the recent demo seems to be in a bit of a chatterbox jam. Read more.

Related videos: Be My Eyes Accessibility with GPT-4o

GPT-4o - Full Breakdown + Bonus Details

OpenAI loses its biggest names in AI safety as Ilya Sutskever and Jan Leike walk away

Ilya Sutskever, co-founder of OpenAI and co-author of the groundbreaking AlexNet paper, resigns after nearly a decade, citing a personal project. Jakub Pachocki, a veteran of 7 years, succeeds him as Head of Research, praised by CEO Sam Altman as exceptionally brilliant. Altman expresses confidence in Pachocki's leadership, highlighting his pivotal role in major projects. Sutskever's departure follows his involvement in the temporary removal of CEO Sam Altman in 2022 over concerns of rapid commercialization and security risks, which a subsequent investigation deemed unjustified. Sutskever aided Altman's reinstatement before stepping down from the board. Read more.

Etcetera: Stories you may have missed

Microsoft Earmarks $4.3 Billion for AI/Cloud Projects in France

Anthropics Claude 3 now available in the EU and as iOS app

US and China to hold AI safety talks to prevent "miscalculation and unintended conflict"

Aurora ranks as world’s fastest AI supercomputer in latest TOP500 benchmark

Big Tech sees neurotechnology as its next AI frontier

UAE's Technology Innovation Institute Launches AI Model With Vision-to-Language Capabilities

NASA appoints 1st AI chief to keep agency on 'the cutting edge'

TikTok is testing AI-generated search results

5 new AI-powered tools from around the web

Voicenotes is an AI-powered app using GPT-4-Turbo for accurate transcription, offering advanced querying, Amazon S3 storage, and integrations with Zapier and Readwise.

BoodleBox is a secure AI collaboration platform featuring top AI tools—ChatGPT, Claude, Gemini, Llama, DALLE, and 1,000+ custom GPTs—for enhanced team productivity.

Stunning is an AI-powered platform enabling marketing agencies to quickly build websites, generate social media content, blogs, lead generation, and perform bulk SEO tasks.

comsensei.com is an AI-driven tool that generates creative, relevant, and available domain names, optimizing user input to enhance brand identity and digital marketing.

Elementor AI integrates AI with WordPress to generate custom images, high-quality text, custom code, and dynamic layouts, optimizing web development efficiency and design precision.

Wegic is the first AI web designer & developer, powered by GPT-4o. It simplifies website creation through multilingual conversations, offering free credits and diverse features.

fynk is an AI-driven contract management platform, streamlines contract workflows from creation to signature, consolidating contracts and enhancing collaboration.

Lune AI offers personalized AI-generated language lessons, quizzes, and flashcards tailored to individual learning levels, enhancing language acquisition through customized content and gamified experiences.

Canny uses AI to automate feedback management by detecting, deduplicating, replying to, and summarizing feedback from various communication channels.

Ploogins is an AI-powered WordPress plugin search engine that simplifies finding plugins by providing curated suggestions from both official and commercial sources.