AI Breakfast
Posts
Meta's Llama 3 Coming Soon

Meta's Llama 3 Coming Soon

AI Breakfast
March 01, 2024

Good morning. It’s Friday, March 1st.

Did you know: On this day in 1860, Herman Hollerith - the inventor of the tabulating machine - was born.

In today’s email:

AI Language Models and Generative Tools
AI in Content Creation and Editing
AI in Hardware and Computing
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

_{In partnership with BITGRIT}

BitGrit: Where AI meets Web3 in a groundbreaking online competition platform for data scientists

Bitgrit is democratizing the field with a blockchain-powered ecosystem that rewards innovation, collaboration, and expertise. Join a global movement that brings together data scientists, businesses, and data providers in a transparent, unified platform. It's not just about participating; it's about leading the charge in integrating AI into our lives and work.

Why join bitgrit?

Transformative Community: Dive into competitions, connect with peers, and turn your AI solutions into opportunities.

Collaborative Marketplace: Access an expansive network where your skills can solve real-world problems and be crowd-funded by businesses and the community alike.

Empowerment and Innovation: Leverage our platform to showcase your talents, engage with cutting-edge challenges, and monetize your contributions.

Today’s trending AI news stories

AI Language Models and Generative Tools

> Meta plans to release its Llama 3 language model in July, positioning it to compete with OpenAI's GPT-4. The model aims for improved responsiveness, providing context on complex topics instead of blocking inquiries. It will better handle words with context-dependent meanings, reducing misunderstandings. Meta will dedicate teams to ensure the model's safety and appropriate tone. Potentially twice the size of its predecessor, Llama 3 could exceed 140 billion parameters. It's unclear if it will remain solely a language model or become multimodal, handling images. Despite challenges, Meta remains committed to developing generative AI in line with its quasi-open-source strategy.

> A collaboration between Nvidia, Hugging Face, and ServiceNow has resulted in StarCoder2, a groundbreaking series of LLMs designed specifically for generating code. These models come in three sizes (3B, 7B, and 15B parameters) and boast a major advance in training data and techniques, supporting over 600 programming languages, including less common ones like COBOL and mathematics-oriented languages. StarCoder2 promises high-performance code generation across a wide range of tasks while reducing computing costs thanks to optimizations from ServiceNow, Hugging Face, and Nvidia frameworks. Remarkably, the 3B model matches the performance of its much larger predecessor. StarCoder 2 is openly accessible for commercial and academic use.

> Microsoft introduced Copilot for Finance, an AI chatbot designed to streamline common financial tasks within Excel and Outlook. Currently in public preview, the tool integrates with SAP and Dynamics 365 to automate processes such as variance analysis. Its partnership with Dentsu highlights Copilot for Finance's potential to transform real-world workflows, demonstrating its ability to accelerate financial operations and provide actionable insights for decision-making.

> Brain.ai unveiled a cutting-edge device at MWC, demonstrating how generative AI could become the cornerstone of smartphone innovation. Their technology integrates AI algorithms directly into mobile hardware, promising to significantly change device functionality and revolutionizing user experiences. This unveiling highlights the growing role of AI in consumer electronics, emphasizing the potential of advanced algorithms.

AI in Content Creation

> Adobe debuts Music GenAI Control, a prototype AI tool reshaping music creation and editing. Introduced at the Hot Pod Summit 2024, it enables users to generate music from text prompts like "happy dance" or "sad jazz" and seamlessly edit within the interface. Users can fine-tune parameters such as tempo and structure using integrated controls and adjust audio based on reference melodies. Developed in collaboration with academic institutions, Music GenAI Control offers granular control similar to Photoshop for images, promising to transform audio editing. While still in early stages, its potential integration into Adobe's suite of editing tools hints at exciting possibilities for the future of music production.

> Morph Studio has introduced a new AI filmmaking platform that allows users to create cohesive videos using clips generated by Stability AI. The platform simplifies the process with a storyboard interface where users input text prompts for scenes. Its partnership with Stability AI ensures stability for editing and cross-cutting clips. Founded by former PhD students from the Hong Kong University of Science and Technology, Morph Studio aims to distinguish itself from competitors like CapCut with a focus on community building. The company has received $2.5 million in funding from Baidu Ventures and plans to advance its technological capabilities while fostering a strong user base.

> Lightricks, the company behind popular photo and video editing apps, has revealed LTX Studio, a text-to-video tool powered by generative AI. LTX Studio rapidly transforms text prompts into polished videos, granting users extensive editing capabilities for customization. Its user-friendly interface offers powerful features for storytelling and visual control. While Lightricks joins a competitive text-to-video market, LTX Studio prioritizes realism and user agency with tools for narrative adjustments, scene creation, and character customization. Currently in preview, LTX Studio is scheduled for full release on March 27, potentially impacting the future of content creation.

AI in Hardware and Computing

> Apple CEO Tim Cook has announced a major focus on generative AI (GenAI) this year, following the scaling back of its electric car project. Cook's remarks at the shareholders meeting and staff reassignments confirm the shift. Apple has been cautious with GenAI, prioritizing internal use. Expect upgrades to Siri, Spotlight, and Apple Music, potentially enabling complex queries and AI-generated content. Developers may also see AI-powered coding suggestions in Xcode. These features could debut at Apple's Worldwide Developer Conference. Apple engineers are contributing to the field with academic papers and open-source models. Rumors suggest significant upgrades to the Neural Engine in future iPhones, indicating a strategic move towards AI-driven innovation.

> Amazon plans a significant $1 billion investment in startups focused on integrating AI and robotics for warehouse automation. The investment, led by its corporate venture capital arm, targets generative AI for enhanced logistics efficiency. Amazon's increased investment pace is expected in 2024, with initial investments like Mantis Robotics, a developer of collaborative robotic arms. This push aligns with broader industry interest in generative AI, which can rapidly produce text, images, and code. Amazon aims to streamline warehouse operations without entirely replacing human workers, focusing on shifting job responsibilities.

> Alibaba's EMO AI dramatically improves video generation by creating lifelike "talking head" videos from a single image and audio input. Powered by deep neural networks and diffusion models, EMO learns intricate facial motions from audio, surpassing traditional methods in expressiveness. It maintains character identity, produces fluid animations with varied expressions, and works across portrait styles (realistic, anime, 3D) while ensuring accurate lip sync.

5 new AI-powered tools from around the web

Quartzite AI empowers prompt creation for diverse language models like GPT-4. Features Markdown editor, version history, collaboration, and pay-per-use GPT pricing.

FTK empowers educators to customize their AI swiftly. Setup takes 5 minutes with Microsoft's enterprise-grade security. Free access ensures a secure, reliable experience.

Pratham is a video analytics platform that brings Vision AI to users without code or expertise.

Txt2SQL automates SQL query generation for MySQL, Postgres, SQLite, MS SQL databases. Designed to save time, boost productivity for DB admins, developers, analysts.

Chibi AI offers a user-friendly, no-code AI content workbench. Create a customizable AI toolkit within a familiar document editor interface. Tailor workflows, bring your own AI models for a transparent, collaborative experience.

arXiv is a free online library where researchers share pre-publication papers.

📄 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

The paper introduces BitNet b1.58, a 1-bit Large Language Model (LLM) with ternary parameters. It matches full-precision LLMs in performance while being more cost-effective in terms of latency, memory, throughput, and energy. BitNet b1.58 defines a new scaling law for training high-performance LLMs efficiently. It outperforms FP16 LLMs in various tasks and model sizes, offering Pareto improvements. BitNet b1.58 enables new hardware designs optimized for 1-bit LLMs, presenting opportunities for enhanced computation efficiency. With its reduced memory footprint and energy consumption, BitNet b1.58 is suitable for edge and mobile devices, potentially revolutionizing their capabilities. Future work includes exploring 1-bit Mixture-of-Experts LLMs, supporting long sequences, and designing dedicated hardware for 1-bit LLMs.

📄 OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

OmniACT introduces a groundbreaking dataset and benchmark aimed at evaluating the capabilities of autonomous agents in executing computer tasks across desktop and web applications. With over 9.8K meticulously annotated data points, OmniACT spans various operating systems and web domains, covering tasks from simple commands to complex actions. Unlike previous benchmarks, OmniACT emphasizes multimodal agents capable of bridging large language models with visual understanding of UI elements. While existing models, including GPT-4, show promising performance, they still fall short of human proficiency. This underscores the challenges in developing agents proficient in both language comprehension and screen interaction. OmniACT serves as a catalyst for future research, paving the way for advanced multimodal models capable of providing efficient and omnipotent assistance to users across diverse computer tasks.

📄 ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

The paper from Microsoft addresses the issue of self-attention, a key component in large language models (LLMs), which can lead to significant inference latency for long sequences. They propose ChunkAttention, a module that optimizes self-attention by efficiently utilizing shared system prompts in prefixes. By breaking down key/value tensors into smaller chunks and structuring them into a prefix tree, ChunkAttention can detect matching prompt prefixes across multiple requests, improving memory utilization. Additionally, they introduce a two-phase partition algorithm to enhance data locality during self-attention computation. Experiments demonstrate that ChunkAttention accelerates the self-attention kernel by 3.2-4.8× compared to state-of-the-art implementations, particularly benefiting from longer shared system prompts.

📄 AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

AgentOhana introduces a novel approach to addressing the challenges of heterogeneous data sources in multi-turn LLM agent trajectories. It aggregates data from diverse environments, standardizes trajectories into a consistent format, and employs a robust training pipeline to maintain equilibrium across sources. Leveraging this unified data, xLAM-v0.1, a large action model tailored for AI agents, demonstrates exceptional performance. The methodology includes AgentRater, a method to rate trajectory quality, and a generic dataloader facilitating seamless integration of diverse datasets. Experiments show effective supervised fine-tuning on 8 Nvidia H100 GPUs, ensuring model robustness and comprehensive exposure to the dataset. AgentOhana promises to advance research and development in autonomous agents powered by LLMs, offering a versatile resource for the community.

📄 TinyLLaVA: A Framework of Small-scale Large Multimodal Models

TinyLLaVA presents a unified framework for designing and analyzing small-scale Large Multimodal Models (LMMs). By investigating various factors such as vision encoders, connection modules, language models, training data, and recipes, TinyLLaVA demonstrates that smaller LMMs can achieve comparable performance to larger ones with better data quality and training recipes. The framework trains a family of small-scale LMMs, with TinyLLaVA-3.1B outperforming existing 7B models. This work addresses the challenge of expensive computational resources required for large models and promotes accessibility to research through smaller-scale models. It also highlights the under-explored design space of LMMs and provides baselines for future research in data scaling, training setups, and model selections.

ChatGPT Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.