Small Language Models and an AI Lucid Dream Machine

Good morning. It’s Friday, January 26th.

Did you know: You can run open-source language models completely offline on a tricked-out Macbook?

  • ChatGPT Creates Comics

Today’s trending AI news stories

AI Technology Development and Enhancements

> Microsoft is intensifying its efforts to create more efficient conversational AI models known as Small Language Models (SLMs). These SLMs aim to deliver the quality of larger language models like OpenAI’s GPT-4 while demanding significantly less computational power. Microsoft’s push for efficiency comes in response to the need to make AI technology accessible to a broader audience while managing escalating costs.

> PropheticAI is set to revolutionize the world of lucid dreaming with its groundbreaking innovation, Morpheus-1. This pioneering multi-modal generative ultrasonic transformer is designed to induce and sustain lucid dream states like never before. Unlike its counterparts, Morpheus-1 doesn't rely on language-based commands; instead, it directly interprets cerebral activity and creates ultrasonic holograms for neural stimulation, facilitating lucid awareness. With 103 million parameters and intensive two-day training on eight GPUs, this cutting-edge model challenges conventional trans-cranial focused ultrasound research, which typically depends on standard fMRI data and basic machine learning.

> Nvidia has introduced RTX Video HDR for RTX GPUs, a technology capable of automatically converting SDR (Standard Dynamic Range) content into HDR (High Dynamic Range) content. This feature is complemented by RTX Video Super Resolution, which employs AI to enhance low-resolution, highly compressed video to 4K by adding simulated detail that matches the scene. To utilize the HDR capability, users need an HDR10-compliant monitor. RTX Video HDR is compatible with Chromium-based browsers such as Google Chrome and Microsoft Edge. Enabling this feature requires downloading and installing the January Studio driver, enabling Windows HDR capabilities, and activating HDR in the NVIDIA Control Panel under RTX Video Enhancement.

> MIT and Google researchers have introduced Health-LLM, a framework designed to adapt Large Language Models (LLMs) for health prediction tasks using data from wearable sensors. The study evaluated eight advanced LLMs, including GPT-3.5 and GPT-4, across various health prediction tasks. The Health-Alpaca model, a fine-tuned version of the Alpaca model, outperformed larger models, achieving the best results in five out of thirteen tasks. Context enhancements, including user profile, health, knowledge, and temporal context, significantly improved performance.

> OpenAI has rolled out an array of fresh models while significantly reducing API costs, marking a stride forward in the practicality and accessibility of AI. This involves the prominent GPT-3.5 Turbo, now more budget-friendly by 50% for input and 25% for output, making extensive textual analysis increasingly feasible. Moreover, the update brings a refined version of the GPT-4 Turbo model, specifically addressing issues in task completion, and introduces powerful new text embedding models and a comprehensive, cost-free moderation API.

> An AI Mathematical proof generator has emerged with the introduction of Baldur, by a team of researchers from the University of Massachusetts Amherst and the University of Illinois Urbana-Champaign. This system harnesses the power of Google's Minerva large language model and refines its capabilities by incorporating contextual information, enabling it to autonomously create mathematical proofs. What sets Baldur apart is its capacity to rectify its own flawed proofs, further enhancing its precision. While Thor, the current leading automatic proof generator, boasts a slightly higher proof rate, Baldur's ability to generate comprehensive proofs makes it a valuable asset, particularly when paired with Thor.

AI in Business and Industry

> University of Texas is launching the Center for Generative AI, which will be home to one of the most significant GPU computing clusters in the academic sphere. This powerhouse, known as Vista, will be equipped with 600 NVIDIA H100 GPUs and operated by the Texas Advanced Computing Center (TACC), marking a significant leap in research capabilities and providing cutting-edge AI infrastructure to a diverse range of collaborators.

> PayPal is gearing up to introduce a range of new AI-driven products, including a convenient one-click checkout option, as part of a strategy to revitalize the company under its new CEO, Alex Chriss. The move comes amid a decline in PayPal's stock value by over 22% since January 2023 due to margin concerns. Chriss believes that AI holds the key to their future success, leveraging data insights to offer personalized recommendations to customers and improve the overall shopping experience.

AI in Governance and Regulation

> OpenAI CEO Sam Altman has privately engaged with US lawmakers to discuss the expansion of advanced computer chip production, vital for training AI systems. Altman aims to raise significant funds for a major initiative focused on establishing new semiconductor plants, referred to as “fabs.” These chips are deemed essential for AI technologies like ChatGPT. Altman advocates for a consistent supply of affordable chips to maintain the nation’s economic and military competitiveness. The initiative aligns with the US government’s push to boost domestic chip production and limit the export of advanced AI chips to China.

> The European Commission has introduced a pioneering initiative called "AI Factories" to propel AI innovation in Europe. These AI Factories are envisioned as comprehensive hubs that will grant startups streamlined access to dedicated supercomputing resources, fostering the development of versatile AI models. This strategic move comes on the heels of a recent EU accord on regulating large AI models. As Europe competes with global AI leaders like the US and China.

AI in Security and Countermeasures

> The U.S. Department of Defense is entering an era where artificial intelligence will play a pivotal role in both strategy and operations, with a looming contest between AI and counter-AI measures. Jude R. Sunderbruch from the DOD Cyber Crime Center emphasizes the strategic rivalry in AI advancement, underscoring the U.S.'s strong position due to its collaborative ecosystem encompassing government, industry, academia, and startups. The DOD aims to harness AI for threat analysis and system security testing, acknowledging the transformative potential of the intersection between AI and quantum technologies.

AI Ethics, Patents, and Public Perception

> Google has settled a patent infringement lawsuit related to its AI technology, avoiding a potential $1.67 billion in damages. The lawsuit was filed by Singular Computing, which claimed that Google had misused its computer-processing innovations in AI features for various services, including Google Search and Gmail. The settlement was reached on the same day that closing arguments were set to begin in the trial. While the details of the settlement were not disclosed, Google stated that it did not violate Singular's patent rights. The lawsuit highlighted the ongoing legal challenges surrounding AI technology.

arXiv is a free online library where researchers share pre-publication papers.

This study by Tencent AI Lab provides a detailed survey of the rapid advancements in MultiModal Large Language Models (MM-LLMs). It presents a thorough examination of MM-LLM architecture, encompassing design formulations, and training methodologies. The paper introduces and characterizes 26 unique MM-LLMs, each with its own specific features and capabilities. It meticulously evaluates the performance of these models on mainstream benchmarks and extracts essential training recipes to enhance the effectiveness of MM-LLMs. Furthermore, it explores future research avenues, suggesting promising directions for the field. The paper also includes a dedicated website for real-time tracking of MM-LLM developments, aiming to be a pivotal resource for continuous updates and contributions, thereby fostering further advancement in the realm of MM-LLMs.

UNIMO-G introduces a multimodal conditional diffusion framework, enhancing text-to-image diffusion models by adeptly handling complex multimodal prompts with both textual and visual inputs. It consists of a Multimodal Large Language Model (MLLM) for prompt encoding and a conditional denoising diffusion network for image generation. A strategic two-stage training, starting with text-image pair pre-training and followed by instruction tuning with multimodal prompts, is employed. UNIMO-G excels in generating high-quality images, even from prompts with multiple entities, positioning it as a significant advancement in the realm of image generation.

GALA innovatively transforms a single-layer 3D human mesh into animatable, layered assets for diverse poses and avatar customization. It excels in decomposing and inpainting geometry and texture for occluded regions using a pre-trained 2D diffusion model. Challenges of standard reconstruction approaches, like treating humans as single-layer geometry, are tackled by synthesizing missing components in both posed and canonical spaces. GALA outperforms existing solutions in decomposition, canonicalization, and composition tasks, marking a significant advance in creating practical, compositional 3D assets from single scans.

This CMU project develops an innovative robotic system for handling real-world articulated objects such as doors and cabinets in unstructured environments. Using an adaptive learning approach, it begins with behavior cloning from minimal data and evolves through online interaction with new objects. The low-cost ($20,000) mobile manipulation platform shows exceptional adaptability and learning efficiency, effectively operating 20 diverse objects across CMU campus. With less than an hour of online learning per object, the system's success rate impressively increases from 50% to 95%, marking a significant leap in autonomous robotic manipulation in open, unstructured spaces.

Unitxt, developed by IBM Research, is an innovative library designed for customizable textual data preparation and evaluation in generative NLP. It addresses the rigidity of traditional text processing pipelines by offering a modular, flexible solution. Integrating seamlessly with libraries like HuggingFace and LM-eval-harness, Unitxt breaks down processing flows into modular components, facilitating easy customization and sharing. Its central feature, the Unitxt Catalog, serves as a repository for these components, encouraging a community-driven approach to building, sharing, and refining data processing pipelines. This tool enhances efficiency and innovation in NLP, streamlining experimentation and research processes.

