AI Breakfast
Posts
Small Language Models and an AI Lucid Dream Machine

Small Language Models and an AI Lucid Dream Machine

AI Breakfast
January 26, 2024

Good morning. It’s Friday, January 26th.

Did you know: You can run open-source language models completely offline on a tricked-out Macbook?

In today’s email:

AI Technology Development and Enhancements
AI in Business and Industry
AI in Governance and Regulation
AI in Security and Countermeasures
AI Ethics, Patents, and Public Perception
6 New AI Tools
Latest AI Research Papers
ChatGPT Creates Comics

You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories

AI Technology Development and Enhancements

> Microsoft is intensifying its efforts to create more efficient conversational AI models known as Small Language Models (SLMs). These SLMs aim to deliver the quality of larger language models like OpenAI’s GPT-4 while demanding significantly less computational power. Microsoft’s push for efficiency comes in response to the need to make AI technology accessible to a broader audience while managing escalating costs.

> PropheticAI is set to revolutionize the world of lucid dreaming with its groundbreaking innovation, Morpheus-1. This pioneering multi-modal generative ultrasonic transformer is designed to induce and sustain lucid dream states like never before. Unlike its counterparts, Morpheus-1 doesn't rely on language-based commands; instead, it directly interprets cerebral activity and creates ultrasonic holograms for neural stimulation, facilitating lucid awareness. With 103 million parameters and intensive two-day training on eight GPUs, this cutting-edge model challenges conventional trans-cranial focused ultrasound research, which typically depends on standard fMRI data and basic machine learning.

> Nvidia has introduced RTX Video HDR for RTX GPUs, a technology capable of automatically converting SDR (Standard Dynamic Range) content into HDR (High Dynamic Range) content. This feature is complemented by RTX Video Super Resolution, which employs AI to enhance low-resolution, highly compressed video to 4K by adding simulated detail that matches the scene. To utilize the HDR capability, users need an HDR10-compliant monitor. RTX Video HDR is compatible with Chromium-based browsers such as Google Chrome and Microsoft Edge. Enabling this feature requires downloading and installing the January Studio driver, enabling Windows HDR capabilities, and activating HDR in the NVIDIA Control Panel under RTX Video Enhancement.

> MIT and Google researchers have introduced Health-LLM, a framework designed to adapt Large Language Models (LLMs) for health prediction tasks using data from wearable sensors. The study evaluated eight advanced LLMs, including GPT-3.5 and GPT-4, across various health prediction tasks. The Health-Alpaca model, a fine-tuned version of the Alpaca model, outperformed larger models, achieving the best results in five out of thirteen tasks. Context enhancements, including user profile, health, knowledge, and temporal context, significantly improved performance.

> OpenAI has rolled out an array of fresh models while significantly reducing API costs, marking a stride forward in the practicality and accessibility of AI. This involves the prominent GPT-3.5 Turbo, now more budget-friendly by 50% for input and 25% for output, making extensive textual analysis increasingly feasible. Moreover, the update brings a refined version of the GPT-4 Turbo model, specifically addressing issues in task completion, and introduces powerful new text embedding models and a comprehensive, cost-free moderation API.

> An AI Mathematical proof generator has emerged with the introduction of Baldur, by a team of researchers from the University of Massachusetts Amherst and the University of Illinois Urbana-Champaign. This system harnesses the power of Google's Minerva large language model and refines its capabilities by incorporating contextual information, enabling it to autonomously create mathematical proofs. What sets Baldur apart is its capacity to rectify its own flawed proofs, further enhancing its precision. While Thor, the current leading automatic proof generator, boasts a slightly higher proof rate, Baldur's ability to generate comprehensive proofs makes it a valuable asset, particularly when paired with Thor.

AI in Business and Industry

> University of Texas is launching the Center for Generative AI, which will be home to one of the most significant GPU computing clusters in the academic sphere. This powerhouse, known as Vista, will be equipped with 600 NVIDIA H100 GPUs and operated by the Texas Advanced Computing Center (TACC), marking a significant leap in research capabilities and providing cutting-edge AI infrastructure to a diverse range of collaborators.

> PayPal is gearing up to introduce a range of new AI-driven products, including a convenient one-click checkout option, as part of a strategy to revitalize the company under its new CEO, Alex Chriss. The move comes amid a decline in PayPal's stock value by over 22% since January 2023 due to margin concerns. Chriss believes that AI holds the key to their future success, leveraging data insights to offer personalized recommendations to customers and improve the overall shopping experience.

AI in Governance and Regulation

> OpenAI CEO Sam Altman has privately engaged with US lawmakers to discuss the expansion of advanced computer chip production, vital for training AI systems. Altman aims to raise significant funds for a major initiative focused on establishing new semiconductor plants, referred to as “fabs.” These chips are deemed essential for AI technologies like ChatGPT. Altman advocates for a consistent supply of affordable chips to maintain the nation’s economic and military competitiveness. The initiative aligns with the US government’s push to boost domestic chip production and limit the export of advanced AI chips to China.

> The European Commission has introduced a pioneering initiative called "AI Factories" to propel AI innovation in Europe. These AI Factories are envisioned as comprehensive hubs that will grant startups streamlined access to dedicated supercomputing resources, fostering the development of versatile AI models. This strategic move comes on the heels of a recent EU accord on regulating large AI models. As Europe competes with global AI leaders like the US and China.

AI in Security and Countermeasures

> The U.S. Department of Defense is entering an era where artificial intelligence will play a pivotal role in both strategy and operations, with a looming contest between AI and counter-AI measures. Jude R. Sunderbruch from the DOD Cyber Crime Center emphasizes the strategic rivalry in AI advancement, underscoring the U.S.'s strong position due to its collaborative ecosystem encompassing government, industry, academia, and startups. The DOD aims to harness AI for threat analysis and system security testing, acknowledging the transformative potential of the intersection between AI and quantum technologies.

AI Ethics, Patents, and Public Perception

> Google has settled a patent infringement lawsuit related to its AI technology, avoiding a potential $1.67 billion in damages. The lawsuit was filed by Singular Computing, which claimed that Google had misused its computer-processing innovations in AI features for various services, including Google Search and Gmail. The settlement was reached on the same day that closing arguments were set to begin in the trial. While the details of the settlement were not disclosed, Google stated that it did not violate Singular's patent rights. The lawsuit highlighted the ongoing legal challenges surrounding AI technology.

6 new AI-powered tools from around the web

Findr streamlines workplace searches by centralizing scattered data from multiple apps like Slack, Notion, and Gmail into one search interface, enhancing productivity and information accessibility with AI-powered insights.

Brainner, an AI-powered resume screening tool, optimizes the hiring process for recruiters and startup founders by automating candidate sorting. This solution promises up to 40 hours of time savings per month.

Steve AI 2.0 is a video creation platform with ChatGPT integration. It merges patented AI and GenAI technology to transform inputs like text or audio into versatile video styles.

Startilla, an AI-powered business development assistant, revolutionizes startup and project initiation by generating crucial business documents like SWOT, Lean Canvas, and marketing plans, streamlining the idea validation and pitching process.

Gigasheet is a big data tool that simplifies analysis of massive datasets up to 1 billion rows, offering cloud integration, automated processing, and advanced features like data enrichment and spreadsheet AI.

nerfstudio offers a modular framework for easy creation, training, and testing of Neural Radiance Fields (NeRFs), enabling users from novices to experts to generate photorealistic 3D scenes from 2D images.

arXiv is a free online library where researchers share pre-publication papers.

📄 MM-LLMs: Recent Advances in MultiModal Large Language Models

This study by Tencent AI Lab provides a detailed survey of the rapid advancements in MultiModal Large Language Models (MM-LLMs). It presents a thorough examination of MM-LLM architecture, encompassing design formulations, and training methodologies. The paper introduces and characterizes 26 unique MM-LLMs, each with its own specific features and capabilities. It meticulously evaluates the performance of these models on mainstream benchmarks and extracts essential training recipes to enhance the effectiveness of MM-LLMs. Furthermore, it explores future research avenues, suggesting promising directions for the field. The paper also includes a dedicated website for real-time tracking of MM-LLM developments, aiming to be a pivotal resource for continuous updates and contributions, thereby fostering further advancement in the realm of MM-LLMs.

📄 UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion

UNIMO-G introduces a multimodal conditional diffusion framework, enhancing text-to-image diffusion models by adeptly handling complex multimodal prompts with both textual and visual inputs. It consists of a Multimodal Large Language Model (MLLM) for prompt encoding and a conditional denoising diffusion network for image generation. A strategic two-stage training, starting with text-image pair pre-training and followed by instruction tuning with multimodal prompts, is employed. UNIMO-G excels in generating high-quality images, even from prompts with multiple entities, positioning it as a significant advancement in the realm of image generation.

📄 GALA: Generating Animatable Layered Assets from a Single Scan

GALA innovatively transforms a single-layer 3D human mesh into animatable, layered assets for diverse poses and avatar customization. It excels in decomposing and inpainting geometry and texture for occluded regions using a pre-trained 2D diffusion model. Challenges of standard reconstruction approaches, like treating humans as single-layer geometry, are tackled by synthesizing missing components in both posed and canonical spaces. GALA outperforms existing solutions in decomposition, canonicalization, and composition tasks, marking a significant advance in creating practical, compositional 3D assets from single scans.

📄 Adaptive Mobile Manipulation for Articulated Objects In the Open World

This CMU project develops an innovative robotic system for handling real-world articulated objects such as doors and cabinets in unstructured environments. Using an adaptive learning approach, it begins with behavior cloning from minimal data and evolves through online interaction with new objects. The low-cost ($20,000) mobile manipulation platform shows exceptional adaptability and learning efficiency, effectively operating 20 diverse objects across CMU campus. With less than an hour of online learning per object, the system's success rate impressively increases from 50% to 95%, marking a significant leap in autonomous robotic manipulation in open, unstructured spaces.

📄 Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

Unitxt, developed by IBM Research, is an innovative library designed for customizable textual data preparation and evaluation in generative NLP. It addresses the rigidity of traditional text processing pipelines by offering a modular, flexible solution. Integrating seamlessly with libraries like HuggingFace and LM-eval-harness, Unitxt breaks down processing flows into modular components, facilitating easy customization and sharing. Its central feature, the Unitxt Catalog, serves as a repository for these components, encouraging a community-driven approach to building, sharing, and refining data processing pipelines. This tool enhances efficiency and innovation in NLP, streamlining experimentation and research processes.

ChatGPT Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.