AI Breakfast
Posts
Google's VideoPoet Text-to-Video Research

Google's VideoPoet Text-to-Video Research

AI Breakfast
December 22, 2023

Good morning. It’s Friday, December 22nd.

In Partnership with:

Growthschool (Holiday Sale) - will help you master AI tools and ChatGPT hacks for FREE . Join their ChatGPT & AI workshop (worth $99) for FREE.

Offer Valid for the First 100 people ONLY. Register HERE🎁

Did you know: The first AI programs were written in 1951 for playing checkers and chess?

In today’s email:

AI in Video and Media Technologies
AI in Healthcare and Science
AI in Computing and Software Development
AI Ethics, Law, and Public Policy
AI Business and Job Market
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Interested in reaching 47,136 smart readers like you? To become an AI Breakfast sponsor, apply here.

Today’s trending AI news stories

AI in Video and Media Technologies

> RunwayML, a video AI startup has introduced two new innovative features to its video generator platform: “Text-to-Speech" for adding synthetic voices and "Ratio" for effortlessly converting videos into various formats. Additionally, RunwayML has launched an ambitious research initiative to develop “world models”, AI systems designed to comprehend and simulate the visual world. These models aim to create detailed environmental maps and realistic human behavior simulations. The initiative aligns with the growing trend of multimodal AI development.

> Midjourney Version 6 is now available, showcasing remarkable advancements in language understanding and prompt coherence, as highlighted by the creator behind Faces of AI, Alie Jules and Nick St. Pierre on X. This latest version significantly improves AI's responsiveness to complex prompts, including those with comma-separated words. Users note that Midjourney v6's ability to handle detailed color assignments and intricate prompts marks a substantial improvement over the previous version 5.2.

> Google publishes research on VideoPoet, a sophisticated LLM designed for generating videos. This advanced model incorporates diverse capabilities like text-to-video conversion, video stylization, inpainting, outpaining, and even video-to-audio conversion. VideoPoet’s approach integrates multiple video generation functions within a single framework, offering enhanced text fidelity and more dynamic motion in videos compared to existing technologies such as RunwayML and Alibaba’s Animate Anyone.

AI in Healthcare and Science

> AI has identified a new class of antibiotics effective against two types of drug-resistant bacteria, offering hope in the fight against the growing global health threat of antibiotic resistance. This AI-guided discovery involved analyzing over 39,000 compounds, leading to potential treatments for Methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin-resistant Enterococcus. The AI models used chemical structures to predict compounds' antibacterial activity and human cell toxicity, marking a significant advancement in AI-assisted drug discovery.

> Researchers from Northwestern University, Boston College, and MIT have innovated a brain-like synaptic transistor that performs associative learning at room temperature. Unlike previous models requiring cryogenic conditions, this new device operates efficiently at ambient temperatures, combining processing and storage capabilities similar to the human brain. This advancement, detailed in Nature, represents a shift from conventional digital computing towards more energy-efficient AI and machine learning tasks, emulating higher-level cognitive functions. The transistor's development is a stride in AI, offering potential for complex data processing with minimal energy consumption.

> Researchers at Carnegie Mellon University have developed a self-taught AI system, dubbed “Coscientist”, for conducting chemistry experiments. This unique setup utilizes three specialized AI instances: a Web searcher for information retrieval, a Documentation searcher for equipment manuals, and a Planner for executing experiments. The AI, trained primarily on GPT-3.5 and GPT-4, successfully synthesized chemicals like acetaminophen and ibuprofen, and optimized chemical reactions. Its notable capabilities include planning chemical synthesis, controlling lab equipment, and analyzing reactions.

AI in Computing and Software Development

> Apple has advanced the field of Large Language Models with its new approach for running LLMs on devices with limited DRAM capacity. The technique, detailed in the paper, “LLM in a Flash,” relies on using flash memory for storing model parameters, which are then transferred to DRAM as needed. This innovation allows models up to twice the size of available DRAM to run efficiency, significantly enhancing inference speed. This breakthrough is part of Apple's broader strategy to incorporate generative AI into its upcoming iOS 18, enhancing applications like Siri and Messages, and exploring potential uses in other apps.

> Moore Threads, a Chinese GPU company, introduces the MTT S4000 with 48GB video memory and 768GB/sec bandwidth. The GPU's performance metrics, 25 TFLOPs FP32, and 200 TOPs INT8, are notable but unlikely to worry Nvidia, Intel, or AMD. Despite being on the US entity list, Moore Threads challenges Nvidia with its CUDA-compatible MUSIFY tool. Its 'kilocard cluster,' featuring 1,000 GPUs, showcases substantial AI training capabilities.

AI Ethics, Law, and Public Policy

> Meta CTO Andrew Bosworth expresses skepticism about watermarking as a reliable method to distinguish deepfakes, foreseeing that society will adapt to the prevalence of deepfakes and AI content. He notes the last 50 years, marked by mostly authentic photos and videos, may be an anomaly, as historically, all forms of media, including visual, were suspect. Bosworth suggests a future where skepticism towards all media, regardless of format, will be necessary.

> Anthropic is updating its terms of service to offer expanded copyright protection for its customers. This move allows users to retain ownership of content generated via Anthropic's services. The company plans to defend users against copyright infringement claims, covering legal costs if needed. These changes, aiming to clarify the murky waters of AI-generated content and copyright law, will be effective from January 1, 2024, for Claude API users and January 2, 2024, for Amazon Bedrock users. Additionally, Anthropic is launching a Messages API beta to assist developers in early bug detection.

> Stanford project named Predicting Image Geolocations (PIGEON), demonstrates the growing capabilities of AI in geolocating photos. The algorithm accurately identified locations in personal photos, highlighting both positive applications like aiding field biologists and enhancing personal photo experiences, and potential privacy concerns. The ACLU's Jay Stanley expresses worries about its misuse in surveillance, tracking, or stalking.

> The UK Supreme Court has ruled against US computer scientist Stephen Taler, declaring that AI cannot be listed as an “inventor” on patent applications. This decision follows Thaler’s attempt to register patents for inventions created by his AI system, DABUS. Similar rulings have been made in Europe, Australia, and the United States. Thaler’s advocates argue that this reveals a gap in UK patent law regarding AI-generated inventions. Additionally, a U.S. court has ruled that AI images cannot be copyrighted, requiring human authorship. However, the Beijing Internet Court recently granted copyright to an AI-generated image, recognizing the plaintiff’s intellectual investment, potentially setting a precedent for future AI copyright disputes.

AI Business and Job Market

> Google is possibly automating ad sales jobs with AI, as reported by The Information. This shift suggests a reduction in the need for human involvement in creating ad assets and text, as AI advances in generating keywords, headlines, and images. A significant restructuring in Google’s ad division might lead to staff consolidation, with AI tools like Performance Max automating ad content creation and optimizing ad placement across various channels. This change could enhance efficiency and profitability in ad operations.

> ChatGPT now features an “archive chats” option, enabling users to tidy up their chat sidebar by archiving conversations without deleting them. This update, currently accessible on Web and iOS platforms, is expected to extend to Android soon.

> Anthropic, an AI startup founded by former OpenAI employees, is negotiating a $750 million funding round by Menlo Ventures, potentially raising its valuation to $18.4 billion. This funding follows a previous $750 million raised and a $2 billion investment commitment from Google. Anthropic’s AI chatbot, Claude 2, rivals OpenAI’s ChatGPT with advanced summarization capabilities and is employed by companies like Slack and Quora.

5 new AI-powered tools from around the we

Pic Copilot, is an AI tool transforming e-commerce imagery. It offers one-click background removal, style templates, AI fashion models, and image translation, enhancing product visuals and boosting online presence.

Findly, is an AI data assistant tailored for Google Analytics. It simplifies data analysis with a chat interface, offering advanced insights, collaborative features, and easy export options.

Storia AI is an image editor with features like “Textify” for text correction and “Cleanup” for object removal. It simplifies background editing, transforms sketches, and offers image variation generation and upgrades image quality with vector conversion capabilities.

Tripo AI transforms any picture into a detailed 3D model in seconds, eliminating the need for slow 3D modeling, photogrammetry, or costly 3D libraries. Ideal for both professional and hobbyists, it democratizes 3D creation.

Cline offers lightweight A/B and split testing for web content and designs, maximizing conversions with a minimal 8KB script. It is 20 times smaller than average, boosting SEO and UX, and uses AI for content variant generation.

arXiv is a free online library where researchers share pre-publication papers.

📄 Mini-GPTs: Efficient Large Language Models through Contextual Pruning

The paper presents an innovative approach to optimizing LLMs through contextual pruning. By strategically pruning computational architecture, the technique maintains essential functionalities while significantly reducing model sizes. Applied across varied datasets, including US law, Medical Q&A, Skyrim dialogue, English-Taiwanese translation, and Economics, it demonstrates the method’s efficiency and effectiveness. This approach not only serves as a practical tool for developing domain-specific, resource-efficient LLMs but also marks a significant step towards more sustainable and applicable AI advancements.

📄 StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

StreamDiffusion introduces a real-time interactive image generation pipeline, enhancing existing diffusion models for scenarios like Metaverse and live streaming. It features batching denoising for high throughput and a novel input-output queue for parallel processing. The pipeline also includes residual classifier-free guidance to reduce computation and a stochastic similarity filtering strategy for energy efficiency, achieving significant speedups and reduced GPU power consumption.

📄 PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

PowerInfer is an innovative inference engine designed for efficient LLM serving on PCs with consumer-grade GPUs. It exploits the high locality in LLM inference, where a subset of neurons (hot neurons) is frequently activated, while the majority (cold neurons) are input-specific. This insight leads to a GPU-CPU hybrid design, preloading hot neurons on the GPU and computing cold neurons on the CPU, minimizing memory demands and CPU-GPU data transfers. PowerInfer integrates adaptive predictors and neuron-aware sparse operators for optimized neuron activation and computational sparsity. It achieves an average token generation rate of 13.20 tokens/s (29.08 tokens/s peak) on a single NVIDIA RTX 4090 GPU, performing close to server-grade GPUs like the A100. PowerInfer significantly outperforms existing systems like llama.cpp, maintaining model accuracy across various tasks.

📄 Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

“Paint3D” is a novel framework for generating high-quality, lighting-less 2K UV textures for 3D meshes using text or image inputs. It addresses the challenge of creating high-resolution textures without embedded illumination, allowing textures to be re-lit or edited within modern graphic pipelines. The method involves a coarse-to-fine process, starting with a pre-trained depth-aware 2D diffusion model to generate view-conditional images and perform multi-view texture fusion for an initial coarse texture map. Further refinement is achieved with separate UV inpainting and UVHD diffusion models, targeting the shape-aware refinement of incomplete areas and removal of illumination artifacts. This results in semantically consistent, lighting-less textures, pushing the boundaries of texturing 3D objects.

📄 DreamTuner: Single Image is Enough for Subject-Driven Generation

DreamTuner is a new method for subject-driven image generation, efficiently using a single image for high-fidelity results. It addresses challenges in maintaining subject learning and generation capabilities by introducing a subject and encoder for coarse identity preservation and modifying and self-attention layers in pre-trained models for detail refinement. DreamTuner’s key feature is its ability to generate diverse images of a subject controlled by text or other conditions like pose, while retaining the subject’s identity and appearance. This method marks a significant advancement in personalized image generation applications.

📄 AppAgent: Multimodal Agents as Smartphone Users

AppAgent by Tencent researchers introduces a novel multimodal agent framework utilizing large language models (LLMs) to operate smartphone applications. This framework enables the agent to interact with apps through basic actions like tapping and swiping, resembling human interactions. It learns to use new apps through exploration or observing human demonstrations, creating a knowledge base for executing complex tasks. Extensive tests across 50 tasks in 10 different apps, including social media, email, and image editing, demonstrate the agent’s adaptability and proficiency. This research marks a significant advancement in AI-assisted smartphone app operation, offering a versatile and efficient tool adaptable to various applications.

ChatGPT + DALLE 3 Attempts Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.