AI Breakfast
Posts
OpenAI's Humanoid Robotics Investment

OpenAI's Humanoid Robotics Investment

AI Breakfast
February 26, 2024

Good morning. It’s Monday, February 26th.

Did you know: Google is working on a foundation model for generative playable worlds?

In today’s email:

AI in Robotics and Automation
Advancements in AI Models and Platforms
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories

AI in Robotics and Automation

> Figure AI has secured $675 million in funding for Figure 01, an AI-powered robot designed to tackle hazardous tasks. Investors include tech giants Jeff Bezos, Nvidia, Microsoft, and even OpenAI, which values Figure AI at $1.9 billion. Underscoring the long-term potential of robotics, OpenAI CEO Sam Altman hints at a possible return to the field – a notable shift after OpenAI abandoned robotics efforts in 2021. While this investment highlights the growing interest in AI-powered robotics, particularly its potential to fill labor gaps in dangerous industries, concerns linger about the environmental impact of generative AI as the true costs of these technologies remain shrouded in corporate secrecy.

> Apple is internally testing an AI tool called "Ask," designed to streamline technical support with ChatGPT-like capabilities. The tool taps into Apple's knowledge base, providing faster responses to customer queries. While a wider rollout is planned based on feedback, Apple's aim is to integrate generative AI into iOS 18, potentially enhancing Siri and iMessage. CEO Tim Cook touts major AI advancements this year alongside Apple's negotiations with media companies to potentially enhance its chatbot services.

> ElevenLabs, a voice technology company, is partnering with Perplexity, a search tool, to launch Discover Daily, a daily podcast featuring curated content about innovation, science, and culture. ElevenLabs' technology makes the podcast easy to listen to on-the-go and is available on all major podcast platforms. The partnership uses Perplexity's real-time data and content selection and includes in-line citations for reliable information and easy fact-checking, setting it apart from other search engines.

> Honor, a Huawei spin-off, introduces pioneering eye-tracking technology enabling car control via gaze, showcased in its Magic 6 Pro smartphone at Mobile World Congress. Utilizing AI and selfie cameras, the device monitors eye movements for commands like engine start and stop. While the technology's integration with automakers remains uncertain, Honor seeks to differentiate itself in the competitive smartphone market.

The Magic 6 Pro will also allow users to open apps simply by looking at them. Additionally, Honor showcased a concept chatbot built on Meta's Llama 2 large language model.

> Jasper has acquired Clipdrop, an AI image processing platform, from Stability AI. Clipdrop's tools, built on Stability AI's open-source models, will integrate into Jasper's offerings. This acquisition follows Stability AI's financial struggles, which some attribute to its earlier purchase of Clipdrop. Notably, Intel invested $50 million in Stability AI in October 2023. Clipdrop's team joins Jasper, with enterprise customers accessing the technology via the Jasper API. Existing Clipdrop users will retain access to the standalone version.

Advancements in AI Models and Platforms

> Mistral AI is reportedly bringing its large language models to Amazon Bedrock. This integration will add Mistral 7B and Mixtral 8x7B to Amazon's foundation model offerings, providing users with powerful tools for tasks like text and code generation. Mistral AI's models are known for their cost-efficiency and performance, using a sparse Mixture-of-Experts architecture. They also offer fast inference, low latency, and transparency, aligning with regulatory needs and appealing to a wide range of organizations. The availability of Mistral AI models on Amazon Bedrock will expand options for developing and scaling generative AI applications.

> YOLOv9, an advanced version of the real-time object recognition AI, achieves higher accuracy with reduced computational complexity. Featuring new technologies like Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN) architecture, YOLOv9 minimizes information loss and accelerates convergence in deep neural networks. Compared to its predecessor YOLOv8, YOLOv9 boasts 49% fewer parameters, 43% less computational complexity, and a 0.6% increase in average precision on the MS COCO dataset. YOLOv9's flexibility and efficiency enable adaptation to various machine vision tasks while maintaining high performance. The source code is available on GitHub, facilitating customization for specific data requirements.

> Groq's CEO believes startups will favor specialized Language Processing Units (LPUs) over Nvidia GPUs for AI tasks by late 2024. This shift comes as Groq, a Silicon Valley AI chip startup, gains attention. Its focus is on fast inference for large language models (LLMs). Groq's LPUs are a potentially more cost-effective option for LLM applications and went viral recently, highlighting their speed. The company prioritizes user privacy and may partner with OpenAI. As Groq expands, its technology and user-focused approach could significantly disrupt the AI chip market currently dominated by Nvidia.

> Qualcomm has launched its AI Hub, a major step forward in on-device AI development. Introduced at Mobile World Congress 2024, the AI Hub features a library of over 75 generative AI models, including Whisper and Stable Diffusion. Developers can easily download these models onto Qualcomm-powered devices. The models are then automatically optimized for the device's hardware, leveraging Qualcomm's AI Engine for up to four times faster performance. This optimization also improves power efficiency and reduces memory usage. On-device AI enhances privacy by eliminating the need for cloud interaction, particularly important for applications handling sensitive data.

> Nvidia has launched its RTX 500 and 1000 Ada Generation Laptop GPUs at MWC 2024 to meet the growing demand for AI capabilities in laptops. These new GPUs include a neural processing unit and Tensor Cores for on-device AI processing, aiming to boost productivity in design and content creation. The RTX 500 features 4GB of dedicated memory, while the RTX 1000 has 6GB, supporting tasks like AI-enhanced video conferencing and 3D rendering. Nvidia promises significant performance gains, including up to 14 times faster generative AI image creation and three times faster photo editing. These lightweight GPUs are designed for thin and light laptops, offering efficiency and productivity gains for various industries.

5 new AI-powered tools from around the web

v0 report is an AI-driven tool for business reporting that gathers data from various sources, incorporating user insights. Simply input a URL or company name to explore topics like consumer behavior, technology adoption, and market trends.

MoAIJobs is a specialist AI job board platform that connects talents with roles in machine learning, research, engineering, and more.

EyePop.ai is an intuitive computer vision platform for non-technical users. Build custom applications ("Pops") effortlessly with no/low code tools for image, video, or real-time stream analysis.

RenderNet is an AI-powered image generation tool offering precise control. Features FaceLock for consistent character creation, ControlNet for detailed compositions, Multi-model generations, and Canvas for professional-level editing.

CodeMate is a pioneering AI-powered search engine designed specifically for developers, offering accurate code-based results, seamless workflow integration, and a strong commitment to data security.

arXiv is a free online library where researchers share pre-publication papers.

📄 Genie: Generative Interactive Environments

Genie, developed by Google DeepMind and the University of British Columbia, revolutionizes generative AI with its interactive environments. Trained unsupervisedly from Internet videos, it enables users to create and explore virtual worlds effortlessly. With 11B parameters, it facilitates frame-by-frame interaction and policy inference from unseen videos, promising vast potential for future AI research. Leveraging spatiotemporal transformers, Genie consists of a video tokenizer, autoregressive dynamics model, and latent action model, allowing for controllable video generation. Its scalability analysis demonstrates graceful performance with increased computational resources. Genie's impact extends to training generalist agents and fostering creativity among users, especially children, in designing their imagined worlds.

📄 GPTVQ: The Blessing of Dimensionality for LLM Quantization

In collaboration with Qualcomm AI Research, a team introduces an innovative approach to neural network quantization specifically designed for Large Language Models (LLMs). By increasing the dimensionality of quantization, they significantly improve the balance between model size and accuracy. The method, termed GPTVQ, utilizes vector quantization (VQ) and a novel fast compression technique applied after training. GPTVQ alternates between quantizing columns and updating unquantized weights, using information derived from the Hessian of the output reconstruction error. This approach sets a new benchmark in the trade-off between size and accuracy across various LLMs. Notably, GPTVQ demonstrates efficiency, with practical processing times on large models, and it offers improved latency compared to conventional methods on mobile CPUs.

📄 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

This study addresses the demand for efficient large language models (LLMs) on mobile devices, driven by rising cloud costs and latency concerns. It focuses on designing high-quality LLMs with fewer than one billion parameters, suitable for mobile deployment. In contrast to common assumptions, the research underscores the importance of model architecture for sub-billion scale LLMs. They introduce MobileLLM, a baseline network incorporating deep and thin architectures, embedding sharing, and grouped-query attention mechanisms, achieving significant accuracy improvements over previous state-of-the-art models. Additionally, they propose immediate block-wise weight sharing, further enhancing accuracy without increasing model size. MobileLLM demonstrates substantial advancements in various tasks and holds promise for on-device applications such as chat and API calling, particularly relevant for Meta's mobile platforms. The research is affiliated with Meta Reality Labs and AI@Meta (FAIR).

📄 Orca-Math: Unlocking the potential of SLMs in Grade School Math

The study from Microsoft Research demonstrates that a small language model (SLM) can achieve approximately 87% accuracy on the GSM8K benchmark by training on just 200,000 synthetic math problems. This challenges the belief that achieving over 80% accuracy on GSM8K requires models with 34 billion parameters. The authors introduce Orca-Math, a 7-billion-parameter SLM based on Mistral-7B, which achieves 86.81% accuracy on GSM8K without the need for multiple model calls or external tools. They achieve this by using a high-quality synthetic dataset of 200K math problems and employing an iterative learning technique. This approach outperforms larger models and significantly improves SLM performance, highlighting the importance of innovative learning strategies and dataset generation methods.

📄 Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

The "Gen4Gen" project introduces a new method for creating personalized images from text descriptions. Their focus is on generating images with multiple personalized concepts while accurately representing the given text. They propose a semi-automated process called Gen4Gen to create a dataset called MyCanvas. This process uses advanced techniques in generative models, image processing, and language models to make realistic images with detailed text descriptions. By improving the quality of the data and the way they prompt the training process, they significantly improve the generation of personalized images with multiple concepts without changing the model itself. They also suggest new metrics, CP-CLIP and TI-CLIP, to measure performance more effectively. MyCanvas acts as a thorough benchmark for creating images from text, setting a standard for future research.

ChatGPT Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.