OpenAI Hints at Latest Model

Sponsored by

Good morning. It’s Monday, March 11th.

Did you know: 148 years ago today, the world’s first telephone call was made?

In today’s email:

  • OpenAI reinstates CEO after investigation, adds new board members

  • Claude 3 Opus outperforms other models in study

  • Midjourney's fast "Turbo mode" sparks artist backlash

  • Huawei's efficient PixArt-Σ beats open-source image generation

  • Microsoft's NaturalSpeech 3 AI clones emotion-evoking voices

  • Pika Labs adds sound effects to AI video suite

  • Nvidia CEO: Our pricey GPUs still beat free rival chips for AI

  • RoseTTAFold AI enables complex biomolecule design

  • US Air Force to deploy 1,000 AI drones

  • China researching AI with "prior knowledge"

  • EU Occiglot project releases multilingual AI models

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Over 50,000 subscribers trust AI Breakfast for concise, insightful analysis of the latest artificial intelligence news and releases. Our mission is to keep you up-to-date with straightforward, no-nonsense coverage of the AI industry. We value your input! Reply to this email with your feedback, and help us continue to refine and enhance your experience.

Artificial Intelligence online short course from MIT

Study artificial intelligence and gain the knowledge to support its integration into your organization. If you're looking to gain a competitive edge in today's business world, then this artificial intelligence online course may be the perfect option for you.

  • Key AI management and leadership insights to support informed, strategic decision making.

  • A practical grounding in AI and its business applications, helping you to transform your organization into a future-forward business.

  • A road map for the strategic implementation of AI technologies in a business context.

Today’s trending AI news stories

Altman Returns to OpenAI Board, Hints Next Model

> OpenAI has reinstated CEO Sam Altman to its board following an independent investigation into his conduct. The investigation found that while Altman's actions "did not mandate removal", a breakdown in trust with the previous board led to his dismissal. Three new members - Sue Desmond-Hellmann, Nicole Seligman, and Fidji Simo - join Altman on the expanded board. Altman expressed regret over past misunderstandings but declined further comment. OpenAI plans to strengthen its conflict-of-interest policies and establish a whistleblower hotline.

In a separate development, OpenAI's video-generating AI tool, Sora, will remain in the research phase for the foreseeable future, focusing on user feedback and audio integration before a public release. Meanwhile, the company has streamlined DALL-E 3's user interface with predefined styles and aspect ratios for easier image generation. While competitors like Midjourney may still hold an edge in some areas, OpenAI is making progress. The company's implementation of the C2PA standard underscores its commitment to safe and responsible AI image generation.

Amid these changes, Altman teased a significant new AI model on social media, stating, "It will be worth the wait." This cryptic message, combined with recent organizational shifts, suggests OpenAI is preparing to unveil a major breakthrough that could redefine the field of artificial intelligence. Read more.

AI Model Innovations and Benchmarking  

> Claude 3 Opus Blows other Models out of the Water in AI Research Agency: A study by Qian Huang suggests a potential breakthrough in using large language models as research agents. Huang's team found that Claude 3 Opus significantly outperformed GPT-4-turbo and other models in building effective machine learning algorithms across various tasks, achieving a 35.6% success rate on the MLAgentBench benchmark. However, this success rate varies widely, with near-90% accuracy on established tasks but dropping to 10% or lower on recent Kaggle Challenges or new research problems like BabyLM. Distinct error patterns and trade-offs in efficiency are observed between GPT-4 and Claude. While Huang's findings show promise, further development is needed for reliable use of LLM-based research agents. Read more.

Additional Note: Readers interested in exploring Claude 3 further can discover Anthropic's prompt library for optimized responses.

> Midjourney's "Turbo Mode" speeds image creation, sparks artist complaints: Midjourney's "Turbo mode" extensively speeds up image creation, but its controversial artist name suggestion feature has angered artists who fear their work may be imitated, leading to accusations of plagiarism and devaluation. This raises crucial questions about copyright, attribution, and the authenticity of AI-generated art. The debate emphasizes the urgent need for clearer legal guidelines on AI's impact on creative industries, as landmark rulings are still pending. Read more.

> Huawei’s PixArt-Σ outperforms open-source models in efficient image generation: Huawei's new PixArt-Σ image generation surpasses open-source competitors in quality and efficiency, despite using fewer parameters. PixArt-Σ's success stems from its "weak-to-strong" training approach, advanced token compression, and a powerful variable autoencoder (VAE). These innovations optimize training and reduce computational demands. Careful data selection and refined image descriptions further enhance its performance. PixArt-Σ highlights significant advancements in AI-generated content, offering superior results with broader accessibility. Read more.

> Microsoft's new AI can clone voices, evoking emotions: Developed by Microsoft Research Asia and partners, NaturalSpeech 3 employs a neural codec to dissect speech into sub-units, enhancing control over content, prosody, and acoustic details. Outperforming predecessors, NaturalSpeech 3 excels in speech quality, similarity, and prosody, rivaling real speech recordings. Users can manipulate speech attributes to evoke desired emotions, setting a new standard for synthetic speech quality. Despite not being publicly released due to security concerns, Microsoft emphasizes responsible usage. Read more.

> Pika Labs adds sound effects to AI video suite: Pika Labs, a leading AI video generation platform, has unveiled a new Sound Effects feature. This addition enables users to easily add custom or AI-generated sound effects and music to their videos, simplifying the post-production process. The feature covers a wide range of sounds, from bustling cityscapes to peaceful natural scenes. This comprehensive sound integration positions Pika Labs as a compelling option in the competitive AI video generation space. Read more.

> Nvidia CEO: Our GPUs are worth the price, even against free competitors: Jensen Huang boldly claims that Nvidia GPUs are so superior for data center AI that even “free competitor chips” wouldn't offer a more cost-effective solution. He dismisses GPU pricing as a major factor in total cost of ownership (TCO), emphasizing the importance of programmability and standardization across cloud platforms. Huang asserts that Nvidia GPUs deliver unmatched benefits in deployment speed, performance, utilization, and flexibility, regardless of competitor pricing. His strong statements are likely to spark controversy and challenge rivals like AMD and Intel as Nvidia maintains its dominance in the AI hardware market. Read more.

Latest AI Industry Applications

> New AI tool aids design of complex biomolecules: Researchers at Baker Lab have developed RoseTTAFold All-Atom, a groundbreaking AI tool for biomolecular design. Unlike previous tools focused solely on proteins, it can model a wide range of biomolecules that interact with proteins, including small molecules like iron and DNA/RNA. By learning from sequences and structures, RoseTTAFold All-Atom maps complex molecular structures with extreme detail, aiding in the design of advanced therapies. The tool can even collaborate with generative AI to create proteins that interact with medications or control essential molecules, significantly expanding its potential for drug development. Now available to scientists. Read more.

> US Air Force plans to deploy 1,000 AI drones: Collaborative Combat Aircraft (CCA) will support manned aircraft in combat missions, aiming to enhance tactical capabilities and reduce costs. The AI-driven drones will conduct reconnaissance and engage targets, operating in coordination with piloted fighter jets. Inspired by the success of drones in Ukraine, this initiative signals a major strategic shift with potential to redefine aerial combat. Read more.

Around the Globe

> Chinese researchers aim to make AI smarter by incorporating "prior knowledge": Researchers in China are working to equip AI models with "prior knowledge," such as physics laws or mathematical logic. This "informed machine learning" approach, published in Cell Press, aims to make AI more accurate and applicable in real-world scenarios. Unlike traditional deep learning methods that rely solely on data, this framework integrates data with fundamental principles. Scientists from Peking University and the Eastern Institute of Technology developed a method to assess and choose relevant rules when training models. Their goal is for AI to independently discover knowledge from data, leading to "real AI scientists." This new approach holds promise for scientific discovery, engineering, and the development of more robust and efficient AI systems overall. Read more.

> European project unveils AI models to promote language diversity: Occiglot, an academic research collective, has released ten intermediary AI models for major European languages, aiming to preserve linguistic diversity and digital sovereignty in Europe. Based on the Mistral 7B model and optimized through bilingual pre-training and instructional tuning, these open-source models are available on Hugging Face. Occiglot seeks to expand coverage to all 24 official EU languages while partnering to create datasets and evaluation methods. Backed by prominent research institutions, Occiglot aims to boost European competitiveness in AI while safeguarding its diverse linguistic and cultural heritage. Read more.

5 new AI-powered tools from around the web

IdeaApe is an AI market research tool for validating business ideas. Advanced yet user-friendly, it offers SaaS solutions.

ChatDesigner simplifies image generation and editing with conversational AI. Users effortlessly create stock photos, edit precisely, generate AI portraits, and design logos.

Salt AI streamlines AI workflow creation, sharing, and scaling by offering a platform with free GPU resources, one-click deployment for various applications and autoscaling infrastructure.

Invoke AI is an AI image generator that swiftly transforms text, image prompts, or sketches into high-quality visuals. Users enjoy full IP control, making it ideal for immersive worlds, brand imagery, architectural renderings, and product designs. is an AI-driven learning tool offering personalized revision schedules and adaptive learning techniques. Utilizing spaced repetition algorithms, it enhances information retention.

arXiv is a free online library where researchers share pre-publication papers.

​​Gemini 1.5 Pro offers a breakthrough in multimodal AI capabilities, achieving near-perfect recall on long-context tasks across text, video, and audio. It surpasses previous models in benchmarks, handling contexts of up to 10M tokens. This means Gemini 1.5 Pro can process vast inputs like lengthy documents, hours of video, or days of audio recordings. Its sparse mixture-of-experts architecture enables efficient computation, delivering impressive performance without excessive training demands. This powerful long-context understanding allows for advanced tasks like code analysis, language translation, and complex scene recognition from large text prompts. While expanding its capabilities, Gemini 1.5 Pro maintains high-quality core functions, outperforming its predecessors in multiple evaluations. The report highlights the need for innovative testing methods to thoroughly assess models with such long-context handling, pushing the boundaries of multimodal AI.

DeepSeek-VL is an open-source Vision-Language (VL) model built for practical applications and designed to overcome limitations faced by similar models. It trains on a diverse and scalable dataset including web screenshots, PDFs, charts, and expert knowledge, making it adaptable to various tasks. DeepSeek-VL features a unique vision encoder to efficiently process high-resolution images, ensuring detailed visual information capture. Its balanced training strategy prioritizes language capabilities while integrating vision training, leading to robust performance in both areas. As a vision-language chatbot, DeepSeek-VL delivers a superior user experience, rivaling closed-source models in real-world scenarios. This open-source model aims to foster innovation in multimodal data handling, with plans for scaling and integration of advanced MoE (Mixture of Experts) technology.

ELLA (Efficient Large Language Model Adapter) brings significant improvements to text-to-image diffusion models by effectively incorporating powerful Large Language Models (LLMs). Unlike existing approaches, ELLA doesn't require training of U-Net or LLMs, resulting in a lightweight and efficient solution. Its core innovation, the Timestep-Aware Semantic Connector (TSC), dynamically extracts information from the LLM, helping the diffusion model understand complex prompts accurately. This leads to superior results, especially with prompts involving multiple objects, attributes, and intricate relationships. ELLA can be easily integrated into community models and tools, offering improved text-to-image alignment. Additionally, the new Dense Prompt Graph Benchmark (DPGBench) allows for automated evaluation of models on long, complex prompts. Researchers aim to further explore the integration of LLMs into diffusion models, addressing limitations in training captions and image quality.

Spotify researchers developed 2T-HGNN, a new architecture for personalized audiobook recommendations. This model addresses challenges like content diversity, data sparsity, and scalability, ultimately leading to a more engaging user experience. 2T-HGNN combines a Heterogeneous Graph Neural Network (HGNN) with a Two Tower (2T) model. The HGNN leverages a "co-listening graph" to capture nuanced relationships between audiobooks, while the 2T model ensures scalability by keeping user data separate. To further enhance recommendations, Spotify incorporates user interactions with podcasts and audiobook metadata. Evaluations show significant improvements, with a 23% increase in audiobook streams and a 46% increase in users starting new audiobooks. This modular design integrates seamlessly with existing systems, allowing efficient and flexible recommendations for millions of users. A/B testing confirmed the model's effectiveness, and it's now live on Spotify, promising a more personalized experience across diverse content.

This study investigates the limitations of Vision-Language Models (VLMs) when it comes to tasks requiring complex visual reasoning, specifically using Raven's Progressive Matrices (RPMs). While Language-Language Models (LLMs) excel in text-based reasoning, VLMs struggle with visual deduction due to challenges like multi-step logic, reliance purely on visual clues, and limited training data. Evaluations show that VLMs perform well on text-based tasks but fall short on visual deduction. Techniques that work with LLMs don't directly translate to VLMs, and perception becomes a bottleneck. This study establishes benchmarks, provides datasets, and outlines future research to improve the visual reasoning capabilities of VLMs.

AI Draws Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.