AI Breakfast
Posts
Apple's Ferret LLM and AI Reads Bedtime Stories

Apple's Ferret LLM and AI Reads Bedtime Stories

AI Breakfast
December 25, 2023

Merry Christmas. It’s Monday, December 25th.

Did you know: We’re hiring? If you’re interested in applying for our Ad Sales position, enter your email here to receive an application.

In today’s email:

AI Innovations and Advances
AI in Business and Industry
Ethical and Intellectual Property Concerns in AI
AI in Entertainment and Gaming
5 New AI Tools
Latest AI Research Papers
ChatGPT + DALLE 3 Attempts Comics

You read. We listen. Let us know what you think by replying to this email.

Interested in reaching 47,198 smart readers like you? To become an AI Breakfast sponsor, apply here.

Today’s trending AI news stories

AI Innovations and Advances

> Apple’s Ferret, a new open-source multimodal large language model developed with Cornell University, offers a breakthrough in AI research. Released quietly in October on GitHub without much initial fanfare, Ferret gained attention for its ability to use image regions as queries. It can precisely identify elements within an image, aiding complex queries. Despite being under a non-commercial license, its open source nature marks a shift in Apple’s typically secretive approach and could influence future Apple products. Unique for being trained on Nvidia’s A100 GPUs, Ferret represents Apple’s increasing openness and contribution to AI advancements.

> Nvidia, University of Toronto, and MIT have developed “Align Your Gaussians” (AYG), an innovative AI system capable of creating 3D animations directly from text descriptions. AYG utilizes 3D Gaussian functions to shape 3D objects and deformation fields for animating them. It combines various AI models, including Stable Diffusion for image realism, a text-to-video model for smooth motion, and a multi-view 3D model for geometric consistency. This breakthrough enabled AYG to generate animations with lifelike motion and textures, from simple textual prompts like “a horse galloping across a meadow”. The system has potential applications in creative tools and synthetic data generation, crucial for areas like autonomous driving. Additionally, AYG can produce longer sequences and multiple animated objects within a single scene, expanding its utility in diverse, creative and technical domains.

> China advances in AI as four large generative AI models gain official approval in the country. This achievement, marking the first of its kind, is set to expedite and regulate the development of AI-generated content (AIGC) globally. Companies like Baidu, Tencent, Alibaba, and 360 Group developed these models, assessed by China's Electronics Standardization Institute under the Ministry of Industry and Information Technology. This move aims to enhance generality, intelligence, and security across various domains, including language, speech, and visual content.

> Researchers from DTU, the University of Copenhagen, ITU, and Northeastern University have developed an AI model called life2vec that can predict events in people's lives, including estimating the time of death. By analyzing health data and labor market attachment for 6 million Danes, the model, based on transformer models, outperformed other neural networks in predicting outcomes like personality and time of death with high accuracy. The researchers raise ethical concerns about data privacy and bias, emphasizing the need for a democratic conversation on the technology's implications and uses.

AI in Business and Industry

> Infosys, a major IT services provider, terminated a substantial $1.5 billion AI deal with an undisclosed global client, initially planned for 15 years. This unexpected decision follows shortly after the resignation of Infosys’ CFO Nilanjan Roy. The termination, announced on December 23, 2023, raises concerns about Infosys and other Indian IT companies facing challenges amid a period of muted business growth. This development, along with Infosys’ narrowed revenue growth guidance and recent deal acquisitions, notably the five-year agreement with LKQ Europe, reflects the volatile environment in the IT sector. Infosys is set to announce its October-December quarter earnings in January 2024.

> Apple is reportedly in talks with major news publishers to license their content for its generative AI initiatives. The New York Times revealed that Apple proposed multi-year agreements worth upwards of $50 million to access extensive archives of news articles. Targeted publishers include Condé Nast, NBC News, and IAC. These developments signify Apple’s deepening engagement in generative AI technology, a shift from its previous focus on enhancing basic functions in new devices. The negotiations, however, have received mixed responses from publishers.

> Meta has announced the availability of the Llama Guard model for Amazon SageMaker JumpStart, offering input and output safeguards for large language models (LLMs). Llama Guard is part of the Purple Llama initiative, promoting responsible AI model development. It aids in identifying potentially risky or inappropriate content, making it suitable for chatbots, content moderation, and more. Users can access Llama Guard through SageMaker JumpStart, which provides foundation models and solution templates for ML development.

Ethical and Intellectual Property Concerns in AI

> A study by Patronus AI reveals that large language models, such as OpenAI’s GPT-4 Turbo, face significant challenges in accurately answering financial questions, particularly related to SEC filings. In the study, GPT-4 Turbo achieved only 79% accuracy, despite receiving comprehensive prompts. The issues include refusal to answer and generating inaccurate information. While AI models show potential in finance, current performance underscores the need for human involvement. Possible solutions include improved prompting, but the effectiveness remains uncertain. The "lost in the middle" problem in LLMs also raises concerns about their suitability for finance tasks with large context windows.

> Parents are gushing over personalized bedtime stories featuring Bluey courtesy of AI. But Ludo Studio, the show's creator, is crying foul, claiming copyright infringement. While offering personalized adventures, concerns over copyright and ethics loom large in the digital landscape. The central question remains: Can machine-generated fables capture the essence of human narratives? Experts emphasize caution, advocating for human creativity over algorithmic bedtime stories. The future of bedtime reading seems poised for an AI takeover as the battle over stolen stories and digital storytelling unfolds.

AI in Entertainment and Gaming

> Generative AI is poised to revolutionize the gaming industry, with Microsoft's Xbox partnership with Inworld AI highlighting the potential. The focus is on non-playable characters (NPCs), allowing them to evolve beyond predefined roles, adapt to player behavior, and contribute to dynamic game worlds. This shift promises increased player engagement, immersion, and replayability. AI's impact extends to game development, automating tasks like storyboard design and NPC dialogue creation, potentially increasing user-generated content.

_{This edition is brought to you by:}

Our book, Decoding AI: A Non-Technical Explanation of Artificial Intelligence is on sale for just $2.99 today only!

(with a 100% money-back guarantee)

Decoding AI breaks down the complexities of AI into digestible concepts, walking you through its history, evolution, and real-world applications.

We'll introduce you to the key players in the AI field, as well as explain the underlying algorithms, data, and machine learning concepts that power AI systems. You'll gain a deeper understanding of deep learning, neural networks, and reinforcement learning, and we'll explore various types of AI, from rule-based systems to probabilistic networks and beyond.

The goal was to make this book an approachable discovery of how AI works.

_{Your support is truly appreciated}

5 new AI-powered tools from around the web

Spiritme AI is a scriptwriter that converts PDFs into explainer videos using AI, featuring lifelike avatars and emotions. Offers AI Filming Assistant, Dynamic Expressions Engine. Over 65k users, open for feedback.

Fanfuel is an AI assistant for YouTubers. It generates scripts, thumbnails, music, and more. Features include AI Thumbnail Generator, Script Maker, AnalyticsChat, Metadata Maker, Idea Suggester, Face Swapper, and Script Narration.

Jaeves AI is an all-in-one AI suite for dynamic content creation. Features include advanced GPT, AI Image Generation, AI Vision & Chat, Text-to-Speech, AI Ads Copy, and multi-language support.

Tripo AI is an AI-powered 3D modeling tool creating models from text/images in seconds. Offers free basic (10 models/month), professional plan ($29.90 for 100 models/month), and premium plan ($199 for 1000 models/month).

GPTEngineer is a rapid prototyping tool for creating interactive web apps using natural language. Features version control with git and grants code ownership, enabling smooth human developer integration at any stage.

arXiv is a free online library where researchers share pre-publication papers.

📄 Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

This paper introduces DP-SIMS, a pioneering GAN-based approach for semantic image synthesis that excels in generating high-quality images consistent with user-provided semantic label maps. By incorporating pre-trained backbones in GAN discriminators and developing a novel generator architecture with enhanced context modeling and cross-attention noise injection, DP-SIMS achieves superior performance. Demonstrating state-of-the-art results on ADE-20K, COCO-Stuff, and Cityscapes datasets, it outperforms existing diffusion models in terms of image quality, consistency, and efficiency, offering significantly faster inference.

📄 TinySAM: Pushing the Envelope for Efficient Segment Anything Model

TinySAM, developed by Huawei Noah’s Ark Lab and the University of Science and Technology of China, is an efficient segment anything model (SAM) optimized for computational reduction. It maintains a strong zero-shot performance using full-stage knowledge distillation with an online hard prompt sampling strategy. Post-training quantization and a hierarchical segmentation strategy further reduce computational costs, enabling twice the inference speed with minimal performance loss. TinySAM outperforms counterparts in zero-shot transfer tasks, marking a leap in efficient SAM technology.

📄 PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

PIA (Personalized Image Animator), developed by Artificial Intelligence Laboratory, is a new framework that animates images generated by personalized text-to-image models. It realistically adds motions to images according to text prompts while preserving distinct styles and high-fidelity details. Built upon a base text-to-image model with well-trained temporal alignment layers, PIA seamlessly transforms any personalized model into an image animation model. Its key feature, the condition module, utilizes condition frames and inter-frame affinity for frame synthesis in latent space, enhancing image alignment and motion controllability. PIA is showcased on AnimateBench, a comprehensive benchmark for personalized image animation.

📄 HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

HDPainter, developed by Picsart AI Research (PAIR) in collaboration with UT Austin and Georgia Tech, is a novel training-free approach for high-resolution, text-guided image inpainting using diffusion models. It introduces the Prompt-Aware Introverted Attention (PAIntA) layer, enhancing self-attention scores by prompt information for better text alignment. Additionally, the Reweighting Attention Score Guidance (RASG) mechanism integrates a post-hoc sampling strategy, preventing out-of-distribution latent shifts. HD-Painter allows scaling to high-resolution image inpainting, handling images up to 2K resolution. It surpasses existing methods both qualitatively and quantitatively, demonstrating a significant improvement in generation accuracy.

📄 DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

DREAM-Talk revolutionizes talking face generation by combining two key modules: EmoDiff and Lip Refinement. EmoDiff expertly captures dynamic emotional expressions, aligning with audio and emotion styles, focusing on nuanced facial movements to the audio, ensuring accurate lip-syncing. This synergy creates emotionally expressive talking faces that maintain realism in lip movements, significantly advancing the realism of digital human representations.

ChatGPT + DALLE 3 Attempts Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.