OpenAI's Robot and New NVIDIA Challenger

Sponsored by

Good morning. It’s Friday, March 15th.

Did you know: On this day in 1985, the very first .COM domain, “symbolics.com”, was registered?

In today’s email:

  • Cerebras Chip is 57x larger than Nvidia H100

  • Anthropic's Claude 3 Haiku

  • AI chatbot for real-time cybersecurity, $4/hr

  • Figure 01 Robot by OpenAI

  • Google DeepMind SIMA

  • OpenAI CTO unsure what data trained Sora

  • Apple acquires DarwinAI for on-device AI

  • 5 of 45 Pulitzer finalists used AI this year

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

In partnership with

Artificial Intelligence online short course from MIT

Study artificial intelligence and gain the knowledge to support its integration into your organization. If you're looking to gain a competitive edge in today's business world, then this artificial intelligence online course may be the perfect option for you.

  • Key AI management and leadership insights to support informed, strategic decision making.

  • A practical grounding in AI and its business applications, helping you to transform your organization into a future-forward business.

  • A road map for the strategic implementation of AI technologies in a business context.

Today’s trending AI news stories

New Chips & Models

> This New AI Chip Makes Nvidia’s H100 Look Puny in Comparison: Cerebras Systems, a pioneering semiconductor company based in Sunnyvale, California, introduced its latest innovation: the Wafer Scale Engine 3 (WSE-3). This third-generation processor boasts an astounding 4 trillion transistors and 125 petaflops of AI computing power, overshadowing Nvidia's H100 by a significant margin. Built on a 5-nanometer process, the WSE-3 is 57 times larger and 50 times more powerful than its Nvidia counterpart. Unlike Nvidia, integrates its chips into computing hardware rather than selling standalone chips. The company envisions its systems achieving 256 exaflops of computing power, catering to growing demand from entities like OpenAI and the Middle East. Read more.

> Anthropic Launches Claude 3 Haiku, Offering Fast, Secure, and Affordable AI for Enterprises: Anthropic introduces Claude 3 Haiku, the latest addition to its Claude 3 model family, offering unparalleled speed and affordability for enterprise AI applications. With a processing capability of 21K tokens per second, Haiku excels in analyzing large datasets swiftly, crucial for tasks like customer support and chat interactions. Its pricing model, optimized for enterprise workloads, provides cost-effective solutions for tasks such as document analysis and image processing. Anthropic ensures enterprise-grade security, using rigorous testing and additional defense layers to mitigate risks. Claude 3 Haiku is now available alongside Sonnet and Opus. Read more.

> Microsoft's Copilot for Security Brings Real-Time AI Threat Response to Businesses: Microsoft is set to launch Copilot for Security, an AI-powered chatbot designed for cybersecurity professionals. Powered by OpenAI's GPT-4 and Microsoft's security-specific model, Copilot for Security provides real-time information on security incidents and threats. It offers collaboration features such as a pinboard section and event summarization, with natural language inputs and code analysis capabilities. Unlike the one-off monthly charge for Copilot for Microsoft 365, this service will operate on a pay-as-you-go model, costing businesses $4 per hour of usage. This pricing model allows scalability for AI-powered cybersecurity efforts, aiding in quick deployment and experimentation without upfront charges. Read more.

Robotics & Research

> Meet Figure 01, OpenAI’s Robot That Can Think, See, and Do: Figure AI, in collaboration with OpenAI, introduces "Figure 01," a humanoid robot capable of complex conversations and independent actions. Leveraging OpenAI's multimodal model trained on text and images, the robot interprets its environment, plans future actions, reflects on past experiences, and verbally articulates its decisions. Corey Lynch, a robotics and AI engineer at Figure, highlights the robot's capabilities in detail, emphasizing its ability to comprehend and respond to nuanced queries by referencing past interactions. Controlled by visuomotor transformers, the robot translates visual input into actions at high frequencies, boasting 24 degrees of freedom. Read more.

> Google DeepMind’s SIMA Masters New Games on the Fly: DeepMind, a subsidiary of Google, introduces SIMA, an AI agent adept at mastering multiple tasks in unfamiliar 3D video games. Unlike prior achievements in board games, SIMA learns directly from human gameplay and can navigate diverse gaming worlds without prior knowledge. Through collaboration with gaming studios and training on nine games across various environments, SIMA shows adaptability and skill acquisition. SIMA's performance also improves with exposure to multiple games, demonstrating its potential for generalized intelligence. While requiring human guidance, SIMA's ability to comprehend natural language prompts signals progress towards versatile AI agents. DeepMind aims to extend SIMA's capabilities to complex, multi-stage tasks, envisioning applications beyond gaming in real-world scenarios. Read more.

> OpenAI's Own CTO Doesn't Know What Data Trained Sora: OpenAI's CTO, Mira Murati, revealed in an interview with the Wall Street Journal that the training data for Sora, OpenAI's latest video model, is unclear even to her. While Murati mentioned that Sora is trained on public and licensed data, she couldn't specify if platforms like YouTube or Facebook were included. This lack of clarity raises concerns, especially amid OpenAI facing lawsuits for alleged data theft and unauthorized use of copyrighted material in AI model training. Despite Murati's confirmation that some licensed data is from Shutterstock, questions persist regarding OpenAI's data usage practices. Murati also discussed Sora's high cost compared to existing systems and hinted at its release being similar to DALL-E 3, although delayed due to potential impacts from the US elections. The video generator will be made publically available later this year. Read more.

Apple’s Acquisition & AI Pulitzers

> Apple Strengthens On-Device AI with DarwinAI Acquisition: In a report by Bloomberg, Apple has acquired Canadian AI startup DarwinAI, signaling a strategic shift towards on-device AI for its products. DarwinAI's expertise in lightweight, efficient AI systems aligns with Apple's philosophy of on-device processing, potentially steering resources away from a rumored Apple Car project. This acquisition addresses investor concerns about Apple's AI strategy and its position in the burgeoning AI race. CEO ​​Tim Cook's previous pronouncements on significant AI advancements from Apple this year suggest confidence in this new direction. Read more.

> Five of this year’s Pulitzer finalists are AI-powered: This year's Pulitzer Prizes for journalism revealed that five out of the 45 finalists utilized AI in their research, reporting, or submissions, marking the first time AI usage was disclosed and required. Marjorie Miller, the Pulitzer Prize administrator, noted that this requirement reflects the growing popularity of generative AI and machine learning in the industry. Meanwhile, the George Polk Awards are also considering how to adapt to an increasingly AI-integrated landscape, aiming to develop an AI disclosure policy after this year's awards. Read more.

5 new AI-powered tools from around the web

Airtrain.ai LLM Playground offers no-code comparison of various LLMs including open-source and proprietary models like GPT-4, Gemini, Phi-2, and more. Evaluate quality, cost and performance easily.

WrapFast allows developers to rapidly create AI wrapper apps for iOS, minimizing repetitive tasks. Includes iOS app boilerplate, backend, and secured OpenAI API integration

Fine utilizes AI-powered developers, offering free options to discuss, collect, embed, and share code. Its agents understand business needs, analyze code, and generate/test apps efficiently.

Synthflow AI enables no-code creation of human-like AI voice agents, ideal for handling complex customer calls effortlessly. Lightning-fast AI with hundreds of voices available.

Charmed.ai provides an end-to-end AI toolkit for 3D game development, including Geometry and Texture Generators, 3D Animator, and Quest Generator, enhancing productivity and creative workflow.

arXiv is a free online library where researchers share pre-publication papers.

VLOGGER, developed by researchers at Google, introduces a pioneering framework for audio-driven human video synthesis from a single input image. It advances existing generative diffusion models by integrating a stochastic human-to-3D-motion diffusion model and a novel diffusion-based architecture with spatial and temporal controls. This allows for the generation of high-quality, variable-length videos with controllability over human face and body representations. Unlike previous approaches, VLOGGER does not necessitate individualized training for each person, eliminates the need for face detection and cropping, and produces complete images while considering diverse scenarios crucial for accurate human synthesis. Leveraging the extensive MENTOR dataset, VLOGGER surpasses state-of-the-art methods in image quality, identity preservation, and temporal coherence, while incorporating upper-body gestures. Developed by Google researchers, this methodology demonstrates robustness, generalization across diverse scenarios, and finds applications in video editing and personalization.

The paper introduces a novel approach for open-domain regional image animation through user-provided clicks and short motion prompts. Addressing limitations in existing image-to-video (I2V) methods, this framework allows precise control over object movement, background animation, and multiple object interactions with simplified user input. Key technical contributions include the first-frame masking strategy for enhanced video generation quality, a motion-augmented module utilizing a short prompt dataset for improved prompt following, and flow-based motion magnitude control for precise speed regulation. Comparative evaluations against seven baselines, including commercial tools and research methods, demonstrate superior performance across eight metrics. Follow-Your-Click represents a significant advancement in controllable I2V, enabling practical and intuitive regional image animation with enhanced user interaction.

The paper introduces GiT, a novel framework for versatile visual tasks via a universal language interface. Inspired by successful LLMs, GiT employs a simple multi-layer Transformer architecture, unifying various visual tasks without task-specific modules. It leverages a universal language interface, treating visual tasks as sequence generation problems, enabling auto-regressive decoding across tasks. Unlike previous models, GiT requires no task-specific fine-tuning, achieving strong generalist performance across five benchmarks. With training on 27 datasets, it exhibits robust zero- and few-shot performance. The proposed approach streamlines model design and fosters mutual enhancement among tasks, marking a significant step towards a unified vision model. The paper's contributions include a foundational framework for unified visual modeling and strong generalizability across diverse tasks.

The paper introduces Quiet-STaR, a method enabling language models (LMs) to infer unstated rationales in text, enhancing their predictive capabilities. Unlike prior works limited to specific tasks or datasets, Quiet-STaR generalizes reasoning from diverse unstructured text data, training LMs to generate reasoning aiding future text inference. The process involves parallel rationale generation, mixing post-rationale and base predictions, and optimizing rationale generation using REINFORCE. Quiet-STaR demonstrates significant improvements in zero-shot reasoning tasks like GSM8K and CommonsenseQA without fine-tuning, indicating its scalability and generalizability. Key contributions include the introduction of meta-tokens, a parallel sampling algorithm, and non-myopic loss formulation. Quiet-STaR offers a pathway towards more robust and adaptable LMs capable of learning reasoning in a more general and scalable manner, with potential applications in various natural language understanding tasks.

The paper introduces VisionGPT-3D, a unified framework aimed at enhancing 3D vision understanding by consolidating state-of-the-art vision models. Leveraging large language models (LLMs) like GPT-4, VisionGPT-3D integrates various computer vision (CV) models such as SAM, YOLO, and DINO, addressing challenges in multimodal CV tasks. It automates model selection and optimizes results based on diverse inputs like text prompts. The framework facilitates tasks like 3D mesh creation from 2D depth maps analysis, utilizing techniques such as multi-view stereo and structure from motion. However, limitations arise in non-GPU environments due to library availability and performance issues. Future work involves optimizing algorithms for cost-effective and efficient model training, enhancing prediction precision. VisionGPT-3D aims to maximize the transformation capability of visual applications by combining traditional vision processing methods with AI models within a unified system.

AI Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.