AI Breakfast
Posts
Train a task-specific LLM

Train a task-specific LLM

AI Breakfast
March 29, 2024

Sponsored by

Good morning. It’s Friday, March 29th.

Did you know: On this day in 1995, Kodak released a 0.38 Megapixel digital camera, the DC40, for $1000?

In today’s email:

Claude LLM Trainer: simple custom LLM training tool
AI21 Labs' Jamba: handles 200+ page input on single GPU
Claude 3 beats GPT-4 on Chatbot Arena for first time
X.ai Grok 1.5: better reasoning, 192 page context
Hume EVI: first emotionally intelligent conversational AI
OpenAI pays devs based on custom GPT model usage
All US federal agencies must hire chief AI officer
Google SAFE: accurate low-cost AI fact-checker
Amazon invests $2.75B more in Anthropic (Claude)
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

^{In partnership with BRIGHT DATA}

Data Power-Up with Bright Data

Bright Data elevates businesses by collecting web data into turning it into actionable insights. Our global proxy network and web unblocking tools enable businesses to build datasets in real time and at scale. They provide a competitive edge in Ecommerce, Travel, Finance, and beyond.

Tap into the value of clean and structured data for market research, ML/AI development, and strategic decision-making. With our scalable solutions, you can efficiently enhance your data strategy, ensuring you're always one step ahead. Start with our free trial and see the difference yourself.

Today’s trending AI news stories

AI Model Improvements

> Introducing Claude LLM Trainer - The world's simplest way to train a task-specific LLM: Matt Schumer introduces ‘claude-llm-trainer’, a tool that allows users to build custom Large Language Models (LLMs) with unprecedented speed and ease. Simply outline the model's desired task, and the tool uses Claude 3 to generate data and train it. Built on an open-source framework, claude-llm-trainer offers flexibility with options like using LLaMA 2 7B by default or easily switching to newer models like Mistral 7B. To explore or contribute, visit the GitHub repo. Read more.

> AI21 Labs' new AI model can handle over 200 pages of input: AI21 Labs has introduced Jamba, a new AI model that excels at handling extensive contextual information, boasting a window of up to 140,000 tokens (roughly 105,000 words or 210 pages), which far surpasses existing models. This allows Jamba to generate more coherent and contextually relevant outputs. Unlike traditional models requiring vast computational resources, Jamba operates efficiently on a single GPU with sufficient memory. Jamba's secret lies in its innovative architecture, which combines transformers, a prevailing AI technique, with state space models (SSMs). This hybrid approach empowers Jamba to achieve three times the throughput on long contexts compared to similar-sized transformer models. Read more.

> “The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time: Anthropic's Claude 3 Opus model dethroned OpenAI's GPT-4 on the Chatbot Arena leaderboard, marking a major shift in the LLM landscape. This marks the first time since GPT-4's release in May 2023 that it hasn't held the top spot. Chatbot Arena, a platform developed by LMSYS ORG, provides an objective metric for evaluating AI chatbots, addressing the typical challenge of benchmarking such models. With competitors like Google's Gemini Advanced entering the fray, pressure mounts on OpenAI to innovate and release new models, suggesting continued disruption within the AI landscape. Read more.

> X.ai Announces Grok 1.5 with Improved Reasoning and Longer Context: This upgrade, surpassing its predecessor Grok-1 with top scores on benchmarks like MATH and GSM8K, excels at tasks involving coding, math, and comprehending extensive documents. Grok-1.5's secret weapon is its ability to process massive amounts of information (up to 128,000 tokens, roughly 192 pages), allowing it to handle complex prompts and generate insightful responses. The model's efficient, custom-built training framework ensures scalability and reliability, while its upcoming release to early testers highlights X.ai's commitment to user-driven refinement, paving the way for even greater capabilities in the future. Read more.

> Meet Hume’s Empathic Voice Interface (EVI), the first conversational AI with emotional intelligence: Hume introduces the Empathic Voice Interface (EVI), a groundbreaking AI that analyzes the user's tone to understand underlying emotions, tailoring its language and speech for more meaningful interactions. Accessible via API, EVI offers developers a way to enrich applications with empathy. It responds with human-like tones, learns from user interactions, and seamlessly integrates with any Large Language Model (LLM). Public release is scheduled for April, with early access available to developers on a waitlist. Read more.

> OpenAI's launches program to pay developers based on usage of their GPT models: OpenAI launched a US pilot program to let developers earn revenue based on their custom GPT model usage. This aims to establish a fair compensation model, but specific details on revenue sharing and participants remain confidential. The announcement comes alongside challenges for GPTs, including moderation and spam concerns. With competitors like Anthropic's Claude 3 gaining popularity, the AI model landscape is rapidly evolving. OpenAI seeks to remain competitive, with new features planned for both GPT-4 and DALL-E 3 to drive further innovation. Read more.

AI Policy & Investments

> Every US federal agency must hire a chief AI officer: New guidance from the Office of Management and Budget (OMB) mandates all federal agencies to appoint a chief AI officer and establish AI governance boards. Vice President Kamala Harris emphasized the importance of transparency and algorithmic fairness in AI applications, requiring agencies to submit annual reports detailing their AI systems, potential risks, and mitigation plans. While agencies have begun hiring AI officers, challenges remain in implementing safeguards and monitoring AI systems effectively. This move aligns with the Biden Administration's AI executive order, aiming to strengthen safety standards and attract top AI talent within the government. However, the US still lacks comprehensive AI legislation, relying on executive directives to guide federal efforts in the field. Read more.

> Automating Truth: Google's Low-Cost AI Fact-Checker, SAFE, Debuts with Impressive Performance: Google DeepMind has introduced a new AI system called SAFE (Search-Augmented Factuality Evaluator) designed to tackle the growing challenge of online misinformation. Using a large language model and Google Search, SAFE evaluates factual claims with impressive accuracy (72% matching human ratings, exceeding them in 76% of disagreements on crowd-sourced data) and is 20 times cheaper than human fact-checkers. While questions remain about "superhuman" claims and expert benchmarks, SAFE highlights the need for transparent testing and responsible development as language models evolve. Automated fact-checking tools like SAFE offer promise for a more reliable online information landscape. Read more.

> Amazon Doubles Down on Generative AI with $2.75 Billion Investment in Anthropic: This follows an initial $1.25 billion investment, solidifying Amazon's largest ever outside investment. Anthropic's Claude model and chatbot compete directly with OpenAI's offerings, with Claude 3 reportedly surpassing the performance of both OpenAI's GPT-4 and Google's Gemini Ultra. The investment reflects the heightened competition in cloud computing, as Amazon (AWS), Microsoft (Azure), and Google (GCP) all aggressively pursue AI dominance. Anthropic's reliance on AWS as its primary cloud provider underscores this dynamic, but also raises potential antitrust concerns regarding preferential treatment and "circular investments." Regulatory bodies like the FTC are reportedly scrutinizing these trends in the cloud computing and AI sectors. Read more.

🖇️ Etcetera

> Rabbit partners with ElevenLabs to power voice commands on its device (More)

> Intel confirms Microsoft's Copilot AI will soon run locally on PCs (More)

> Lightning AI launches next-gen AI compiler ‘Thunder’ to accelerate model training (More)

> An MIT Exploration of Generative AI (More)

> Financial sector embraces generative AI and expects widespread adoption in two years, study finds (More)

> Amazon’s palm-scanning service now lets you sign up from your phone (More)

5 new AI-powered tools from around the web

Weavely is an AI form builder in Figma. Creates web forms with custom UX/UI, no code. Design and publish directly in Figma with conditional logic.

Dollars MoCap is an AI-powered motion capture solution for realistic character animations. Offers products tailored to different needs like single-camera, facial, depth camera, and VR capture.

Nexa AI’s Content Creator APIs empower creators to elevate their content with custom GPTs, accessing real-time data for YouTube, Medium, TikTok, and X.

DBRX sets a new benchmark for OpenLLMs, surpassing established models across various benchmarks. It democratizes advanced LLM capabilities previously limited to closed models.

Zerve AI provides a unified environment for data science and AI projects, offering features for exploration, collaboration, building, and deployment. Can be integrated with existing data stacks.

arXiv is a free online library where researchers share pre-publication papers.

📄 EgoLifter: Open-world 3D Segmentation for Egocentric Perception

This paper introduces EgoLifter, a groundbreaking system revolutionizing egocentric perception by automatically segmenting scenes from egocentric sensors into detailed 3D object decompositions. Specifically engineered for natural, non-scanning motions, EgoLifter employs 3D Gaussians as the core representation for both scenes and objects. Leveraging segmentation masks from the Segment Anything Model (SAM), it achieves flexible and adaptive object instance definitions. Addressing the challenge of dynamic objects in egocentric videos, EgoLifter integrates a transient prediction module to filter out such elements during 3D reconstruction, enhancing fidelity. Evaluation on the Aria Digital Twin dataset demonstrates EgoLifter's state-of-the-art performance in open-world 3D segmentation, marking a significant advancement in egocentric perception.

📄 ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

This paper Google Research proposes a method to improve photorealistic image editing, particularly in object removal and insertion tasks. By addressing the limitations of self-supervised approaches, they introduce a "counterfactual" dataset capturing scenes before and after object manipulation. Through fine-tuning a diffusion model on this dataset, they achieve not only object removal but also the removal of associated effects like shadows and reflections. To handle object insertion, they suggest a bootstrap supervision technique to synthetically expand the dataset. Their approach outperforms prior methods in photorealistic object editing by effectively modeling the effects of objects on the scene, demonstrating significant advancements in computational image manipulation.

📄 ViTAR: Vision Transformer with Any Resolution

Researchers propose ViTAR (Vision Transformer with Any Resolution) to address the scalability limitations of Vision Transformers (ViTs) across different image resolutions. ViTAR introduces two key innovations: a dynamic resolution adjustment module and fuzzy positional encoding. The Adaptive Token Merger module efficiently integrates tokens for various resolutions, enhancing adaptability. Fuzzy positional encoding ensures consistent positional awareness across resolutions, preventing overfitting. ViTAR achieves impressive adaptability, with 83.3% top-1 accuracy at 1120x1120 resolution and 80.4% at 4032x4032 resolution, while reducing computational costs. It performs well in downstream tasks like instance and semantic segmentation and integrates with self-supervised learning techniques. ViTAR's contributions include enhancing resolution scalability, providing a cost-effective solution for high-resolution image processing, and compatibility with large-scale unlabeled datasets.

📄 Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Mini-Gemini advances Vision Language Models (VLMs) by enhancing visual tokens, data quality, and generation. Utilizing high-resolution visual tokens and a high-quality dataset, it bridges the performance gap compared to models like GPT-4 and Gemini. The framework supports any-to-any workflow, enabling efficient comprehension, reasoning, and generation of images and text simultaneously. By leveraging dual vision encoders and patch info mining, it efficiently extracts detailed visual cues while maintaining computational feasibility. Mini-Gemini outperforms previous models in zero-shot benchmarks, showcasing its potential for image understanding and VLM-guided generation. Despite achievements, further exploration is warranted to enhance visual comprehension and complex reasoning abilities, potentially through advanced training data and innovative approaches for bridging VLMs with diffusion models.

📄 BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

BioMedLM, a 2.7 billion parameter GPT-style language model trained exclusively on PubMed abstracts and articles, offers a viable alternative to larger, more generalized models like GPT-4 and Med-PaLM 2. By focusing on biomedical text, BioMedLM demonstrates competitive performance in multiple-choice biomedical question-answering tasks, achieving scores comparable to larger models. Furthermore, it addresses key concerns associated with larger models, such as high computational costs, privacy issues, and dependency on corporate entities, by being smaller, more transparent, and deployable on standard hardware. Its availability on the Hugging Face Hub enhances accessibility and fosters research in biomedical natural language processing. BioMedLM showcases the potential of domain-specific, medium-sized language models to drive advancements in biomedical research, healthcare, and information retrieval while mitigating environmental impact and privacy concerns.

AI Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.