OpenAI's Secret Upgrade

Good morning. It’s Wednesday, August 14th.

Did you know: On this day in 1995, Nintendo released the Virtual Boy in North America.

In today’s email:

  • GPT-4o Secret Upgrade

  • Grok-2

  • AI Agent “Q”

  • Gemini Live

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories

OpenAI Confirms ChatGPT Really Did Get a Secret Upgrade

OpenAI has introduced an updated version of GPT-4o for its ChatGPT model, enhancing its performance significantly. Announced on 𝕏, this new version provides faster response times and improved interaction quality. While OpenAI has not detailed the exact changes, it is distinct from the GPT-4o-2024-08-06 API update, which was designed to reduce costs for developers.

There is speculation that this update could involve a larger or differently configured model, potentially linked to Project Strawberry. Users can access the new model via the ChatGPT app, but it is unclear if GPT-4o mini has been updated as well. To fully utilize the new features, a ChatGPT Plus subscription is required. Read more.

xAI Releases Grok-2, Adds Image Generation on X

xAI has launched Grok-2 and Grok-2 mini in beta, advancing from the previous Grok-1.5 model with improved reasoning features. These models, available only to Premium and Premium+ users on 𝕏, now include image generation capabilities. Grok-2 is designed to enhance chat, coding, and reasoning, while Grok-2 mini offers similar benefits in a more compact form.

Early feedback indicates that Grok-2's image generation, powered by Black Forest Labs' FLUX.1, lacks restrictions on political figures, raising concerns about potential misuse. Although specific details about Grok-2’s capabilities are sparse, it is reportedly better at code generation and writing. xAI plans to offer these models through an enterprise API and integrate them into 𝕏’s features, such as advanced search and AI-driven replies.

xAI’s Grok 2 (sus-column-r) ranks #3 on the LMYSYS.org leaderboard with top marks in Coding, Hard Prompts, and Math. Read more.

MultiOn Announces New AI Agent called ‘Agent Q’

Agent Q marks a major advancement in autonomous web agents by addressing the limitations of current LLMs in handling dynamic, multi-step tasks. Traditional methods, which rely on static datasets and supervised fine-tuning, often struggle with complex decision-making due to limited exploration and compounding errors.

Agent Q improves on this by using guided Monte Carlo Tree Search (MCTS), AI self-critique, and reinforcement learning, including the Direct Preference Optimization (DPO) algorithm. This approach enables the model to explore various actions and web pages, learn from both successful and unsuccessful attempts, and refine its decision-making through iterative feedback.

In tests, such as booking experiments on Open Table, Agent Q significantly boosted the LLaMa-3 model's success rate from 18.6% to 95.4%, demonstrating its effectiveness in enhancing autonomous web agents' performance. Read more.

Gemini Live, Google's answer to ChatGPT's Advanced Voice Mode, Launches

Google's Gemini Live, announced at the Made by Google 2024 event, is designed to rival OpenAI’s ChatGPT Advanced Voice Mode. This new feature enables users to have detailed voice interactions with Gemini on their smartphones. It incorporates an advanced speech engine for more natural and responsive conversations, allowing users to interrupt and adjust interactions in real time. Gemini Live offers ten new, natural-sounding voices and supports hands-free operation, including background usage and the ability to pause and resume conversations.

Gemini Live benefits from the large context window of the Gemini 1.5 Pro and Gemini 1.5 Flash models, which enables it to process and retain extensive conversational data. Although multimodal input capabilities are not yet available, they are expected later this year, along with additional language support and iOS compatibility. Gemini Live is available through the $20 per month Google One AI Premium Plan and features new integrations with Google services. Read more.

Etcetera: Stories you may have missed

5 new AI-powered tools from around the web

Profundo automates multi-source data aggregation, performs advanced analytics, and facilitates precise essay composition with integrated academic citations and AI integration.

Conva.AI is the first platform offering AI Assistant as a Service. It streamlines creation and integration of AI Assistants for apps, leveraging platform-specific SDKs.

Postgres.new provides an in-browser Postgres sandbox with AI assistance, enabling instant database creation, data manipulation, and advanced querying capabilities.

Jupitrr AI auto-generates B-roll visuals, leveraging AI to produce stock footage, Google Images, and GIFs, with transcript-based content mapping and customizable animated subtitles.

Tusk is an AI-powered tool for automating UI changes, integrating with Jira and GitHub to streamline bug fixes, copy edits, and code reviews.

Augmeta is an AI-driven platform that enhances product and engineering management by offering adaptable AI peers for decision-making, task automation, and expertise augmentation.

arXiv is a free online library where researchers share pre-publication papers.

📄 Imagen 3

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email!