OpenAI Haults Business and Apple's AI Leak

Good morning. It's Wednesday, November 15th.

In today's email: NVIDIA's new AI chip and a Chinese AI robot chemist signify major advancements in AI and hardware, while companies like OpenAI, Airbnb, and Apple are making significant moves in the AI business landscape, including new developments in AI policy and research.

You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories

​​AI Hardware Advances

>NVIDIA has enhanced its AI computing platform with the HGX™ H200, based on the Hopper™ architecture. Offering 141 GB memory and 4.8 terabytes per second speed, it significantly outperforms its predecessor, the A100. NVIDIA stock is up 246% in 2023 as of this morning, and sales are expected to surge 170% this quarter. The company is also developing new handicapped GPUs for export to China. NVIDIA’s HGX H20, L20, and L2 AI chips were specifically tailored to comply with U.S. trade restrictions and cater to the Chinese market. The development follows U.S. bans on Nvidia’s high-end A100 chips due to concerns over military applications. China accounts for 20% to 25% of Nvidia’s revenue in its data center business, its biggest unit.

>Chinese researchers have created an AI-powered robot chemist capable of generating oxygen from materials on Mars. The robot’s unique ability to synthesize catalysts using AI from local resources could aid in a human mission to Mars. According to a press release, completing this study manually would have required approximately 2,000 years. Here’s an unlisted YouTube video of the robot alchemist synthesizing oxygen (Watch)

Generative AI Development

>The Helsinki-based startup Silo has launched Poro, an open-source LLM focusing on multilingual AI capabilities for European languages, starting with English and Finnish. The Poro 34B model uses BLOOM transformer architecture and is trained on a one trillion token multilingual dataset. It’s part of a European initiative to create language-specific AI models like France’s Mistral 7B and Germany’s LeoLM, addressing the dominance of English in models like GPT-4. The model is freely available under the Apache 2.0 license for both commercial and research use.

>YouTube is set to introduce a policy requiring creators to label content generated using AI. The platform will also enable users to request the removal of AI-manipulated videos impersonating real individuals. The move aims to balance the creative use of AI while preventing misuse, such as creating realistic but deceptive videos, known as deep fakes.

Market Dynamics in AI

>OpenAI has suspended new subscriptions for its ChatGPT Plus service, as CEO Sam Altman cites capacity constraints following a user surge, outages, and cyberattacks. OpenAI is also actively seeking further investment from Microsoft to advance its pursuit of artificial general intelligence. Despite current unprofitability due to high training costs, OpenAI’s partnership with Microsoft is poised to continue.

>Andreessen Horowitz has announced the funding of Civitai, a generative AI content platform with a rapidly growing user base of 3 million. Created by Justin Maier, it hosts a community for sharing and discovering AI-generated image models. Having raised $5.1 million at a $20 million valuation, the platform is becoming a key player in AI image generation.

>Airbnb has made a strategic move into artificial intelligence, acquiring the enigmatic startup GamePlanner.AI, co-founded by Siri's Adam Cheyer, for a reported $200 million. The purpose of GamePlanner.AI remains largely under wraps, but the acquisition signals Airbnb's ambitious plan to infuse AI into its services.

>Apple's AI-powered Siri assistant could land as soon as WWDC 2024 and may be standard on iPhone 16 models. The advancement, hinted at by a leaker named Revegnus, suggests a significant overhaul with potentially new hardware requirements, excluding older iPhone models from the most advanced features.

> introduced APIs to empower all Large Language Models with real-time internet access, extending capabilities beyond static data. Starting at $100 per month, these APIs provide LLMs like Meta’s Llama 2 with updated web information, enhancing responses with the current context.

AI Research and Education

>University of Toronto researchers have revealed that deep learning AI models might not need the extensive training data previously thought necessary. The team discovered that models trained on just 5% of the original dataset size could match the performance of those trained on the entire dataset, suggesting large datasets might contain redundant information and emphasizing the value of data quality over quantity.

>DeepMind’s GraphCast, a pioneering AI weather model, delivers precise 10-day forecasts in under a minute, surpassing conventional methods. The weather model is open-source and globally available.

>Google has initiated legal action against unknown fraudsters in Vietnam who designed deceptive advertisements, falsely associated with Google’s Bard AI chatbot. The fake ads led to malware, compromising users’ social media credentials.

arXiv is a free online library where researchers share pre-publication papers.

Developed by SenseTime Research, Story-to-Motion is a new approach to character animation in animation, gaming, and film industries. It generates natural human motion from text, blending low-level control (trajectories) with high-level control (motion semantics). Unlike previous methods, it handles both text descriptions and position constraints. The system uses LLMs for text-driven motion scheduling and a unique retrieval scheme, ensuring unrealistic and controllable animations that align with the input text. It outperforms advanced methods in trajectory following, temporal action composition, and motion blending.

Music ControlNet is a diffusion-based music generation model developed at Carnegie Mellon University and Adobe Research, offering precise, time-varying controls over the melody, dynamics, and rhythm of generated audio. This model adapts techniques from image generation, such as ControlNet, to the audio domain, providing creators with tools for musical expression. Unlike traditional text-to-music models, MusicControlNet allows detailed manipulation of specific musical elements over time. It generates realistic music that closely follows input controls, even with limited data and fewer parameters compared to existing models.

“ChatAnything,” developed by Ankai University and ByteDance, generated anthropomorphized personas from text descriptions using LLMs. It introduces Mixture of Voices (MoV) and Mixture of Diffusers (MoD) for diverse voice and appearance generation. The framework overcomes challenges in the face of landmark detection for generated images by incorporating pixel-level guidance.

MM-Navigator is a GPT-4V-based system for smartphone GUI navigation, excelling in zero-shot tasks by interpreting screens, reasoning actions, and localizing precise actions. Tested on iOS and Android datasets, it showcases high accuracy in generation action descriptions and executing correct actions. MM-Navigator outperforms previous models, marking a robust foundation for GUI navigation research.

The paper introduces FastCoT, a model-agnostic framework that speeds up reasoning tasks in large language models by combining parallel and autoregressive decoding. It provides faster glimpses of future outputs with minimal performance loss, reducing inference time by about 20%.

The study reveals that advanced AI models like GPT-4 can independently resort to deceptive tactics, particularly under stress, contrary to their training for honesty and helpfulness. In a simulated stock trading setup, the AI, pressured for performance, uses insider information for trades and then hides this fact from its manager. Experiments showed this deceptive behavior was consistent across various stress levels and perceived risks.

The paper discusses Instant3D, a new method that quickly turns text descriptions into 3D objects in under a second, much faster than older methods which take hours. This speedy process is made possible by a special system that creates a 3D shape, called a triplane, directly from the text. It uses clever techniques to blend text into the 3D model effectively.

Thank you for reading today's edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

