AI Breakfast
Posts
Watch the Latest Sora AI Videos

Watch the Latest Sora AI Videos

Plus, GPT-4 may be getting an upgrade...

AI Breakfast
March 27, 2024

Sponsored by

Good morning. It’s Wednesday, March 27th.

Did you know: On this day in 1964, South-central Alaska was struck by a 9.2-magnitude earthquake that was the strongest quake ever registered in the United States. Here’s how UT Austin Researchers are utilizing AI to detect earthquakes before they happen.

In today’s email:

OpenAI's Sora aids artists, enables creative exploration
GPT-4 may remove message limits, DALL-E 3 inpainting
Apple to reveal AI plans at WWDC, June 10-14
Zuckerberg personally poaching Google AI talent for Meta
X Premium subscribers get AI chatbot Grok access
Breakthrough in efficient quantum computing
Samsung launches AGI Computing Labs
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

_{In partnership with GROWTH SCHOOL}

Work lesser & drive 10x more impact using AI

HIGHLY RECOMMENDED: A Power-packed workshop (worth $199) for FREE and learn 20+ AI tools to become 10x more efficient at your work.

👉 Become an AI Genius in 3 hours. Register here (FREE for First 100) 🎁
In this workshop you will learn how to:

✅ Simplify your work and life using AI

✅ Do research & analyze data in seconds using AI tools

✅ Automate repetitive tasks & save 10+ hours every week

✅ Build stunning presentations & create content at lightning speed

👉 Register Now! (FREE for first 100 people only) 🎁

Today’s trending AI news stories

Sora Updates and GPT-4 Upgrades

> Sora: First Impressions: OpenAI's Sora is making a splash in the art world. Working with artists like Trillo and Kleverov, Sora generates both realistic and surreal visuals, aiding creative exploration. Filmmakers like Trillo praise its ability to bypass limitations and explore ideas freely. For groups like shy kids, Sora expands storytelling possibilities. Additionally, artists find it overcomes technical hurdles, allowing for rapid prototyping and visualization. Alex Reben, OpenAI's Artist in Residence, explores Sora's potential in transforming AI-generated imagery into physical sculptures, indicating its versatility across artistic disciplines.

Watch the latest Sora videos on OpenAI’s Blog

Sora is estimated to produce 5 minutes of content per NVIDIA H100 GPU per hour. Factorial Funds project this translates to 120 daily minutes per GPU, potentially requiring around 89,000 H100s to support creators on platforms like TikTok and YouTube. However, factoring in realistic usage patterns and peak demand, this figure could balloon to roughly 720,000 GPUs. Additionally, creators often generate multiple video drafts, further doubling the hardware needs. OpenAI plans to make Sora publicly available later this year, with future iterations incorporating sound and editing tools. They are also targeting Hollywood studios and talent agencies, aiming to integrate Sora into the filmmaking process. Read more.

> OpenAI Hints at GPT-4 and DALL-E 3 Upgrades: Lifting Limits and Enhancing Creativity: A recent discovery by Tibor Blaho suggests OpenAI is exploring innovative features for its upcoming models, GPT-4 and DALL-E 3. For GPT-4, one anticipated improvement is the removal of message limits, potentially replaced with a dynamic system that adjusts based on request complexity. Additionally, a "Model Tuner Selector" might be implemented to optimize resource allocation by directing requests to either GPT-4 or GPT-3.5 based on their suitability. OpenAI is also testing an "Upgraded Response" feature, allowing for smoother transitions between GPT-3.5 and GPT-4 outputs. Meanwhile, potential upgrades for DALL-E 3 include an image editor with inpainting capabilities, further expanding its creative potential. Read more.

Apple, Meta, and X

> Apple WWDC 2024 Dates Locked In: AI Strategy Reveal Imminent: Apple has announced its annual Worldwide Developers Conference (WWDC) will take place from June 10 to 14. While some software developers will be invited to the company's campus for the first day, the event will primarily be livestreamed on Apple's website. Following CEO Tim Cook's February statements on significant AI investments, analysts predict Apple will reveal its long-anticipated AI strategy alongside exciting consumer features. Traditionally, WWDC features keynote presentations showcasing software updates for iPhone, iPad, Mac, and Apple TV. This year, however, additional anticipation surrounds the potential reveal of the first major software update for the Vision Pro, Apple's virtual reality headset. Read more.

> It looks like Mark Zuckerberg is personally trying to poach Google AI researchers for Meta: The company is reportedly bypassing traditional interview processes and offering higher salaries to attract talent away from competitors like Google's DeepMind. Meta CEO Mark Zuckerberg has even taken a personal approach, emailing researchers directly. This aggressive strategy aligns with Meta's revised technology roadmap, prioritizing a unified AI model for video and feed recommendations by 2026. Read more.

> Elon Musk says all Premium subscribers on X will gain access to AI chatbot Grok this week: This follows xAI's open-sourcing of the underlying Grok large language model earlier in March. The $8-per-month X Premium tier now grants access to Grok, known for its ability to tackle sensitive topics and offer unconventional responses. Notably, Grok utilizes real-time X data, potentially giving it an edge over competitors like OpenAI's ChatGPT and Anthropic's Claude. Read more.

Quantum Computing & Samsung’s AGI Bid

> The world is one step closer to secure quantum communication on a global scale: University of Waterloo's Institute for Quantum Computing (IQC) researchers leveraged cutting-edge AI for a breakthrough in quantum communication. Published in Communications Physics, their work details an AI-driven optimization process for efficiently producing near-perfect entangled photon pairs from quantum dot sources. This approach, combining Nobel Prize-winning physics and chemistry with AI, achieves a 65-fold increase in efficiency compared to prior methods. The new source, developed alongside the National Research Council of Canada, holds promise for quantum key distribution and secure global communication networks. Read more.

> Samsung Enters AGI Race: Launches New Labs and Seeks International Partnerships: Samsung Electronics has officially entered the race for Artificial General Intelligence (AGI) with the launch of its AGI Computing Labs. Headed by Dr. Dong-hyuk Woo, a former Google AI chip expert, these labs in the US and South Korea will focus on developing specialized AGI semiconductors. Initially, the efforts will target chips for inference tasks, service applications, and large language models (LLMs). Samsung plans to optimize chip architecture for lower power consumption and continuously iterate on new designs. This move paves the way for potential collaborations with tech giants like Meta, whose focus aligns with AGI development, and OpenAI, which has previously expressed interest in partnering with South Korean firms for AI chip design. Read more.

🖇️ Etcetera

> Qualcomm unveils S5 Gen 3 Sound platform with 'almost 50x more AI power’ (More)

> Google, Intel, and Qualcomm aim to break Nvidia's AI dominance (More)

> Adobe's Firefly Services makes over 20 new generative and creative APIs available to developers (More)

> Real-time rendering of complex volumetric effects just got easier with Gaussian Frosting (More)

> Zoom unveils AI-powered collaboration platform, Zoom Workplace, to reimagine teamwork (More)

> Introducing Stable Code Instruct 3B — Stability AI (More)

> Anthropic Pushes for Rigorous AI Testing with Third-Party Collaboration (More)

> Beware of Fake VR Apps: Researchers Uncover Hidden Vulnerability (More)

> Chinese generative AI developers rush to upgrade chatbots to handle super long texts (More)

> Researchers discover a surprisingly simple retrieval mechanism in LLMs (More)

5 new AI-powered tools from around the web

Eternity AI, a research project at IIT-Patna, pioneers an LLM with real-time internet access, reducing hallucinations and integrating 100K+ behavior parameters for mimicking human behavior.

Otto Engineer, an autonomous AI software engineer, operates in-browser using Web Containers, executing code safely. It iterates and tests its own code, supports npm packages, and requires zero setup.

Martin is an AI butler like Jarvis. It learns, integrates with calendars and even does your email. Powered by Deepgram, OpenAI, and Claude-3.

TigerEye is an AI-powered planning for sales, marketing, and finance leaders. It enables rapid scenario testing and collaborative execution and offers predictive insights and parametric planning.

Pickaxe Studio is a no-code platform for GPT stores. Sell AI tools and chatbots via paywalled subscriptions. Deploy tools, set usage limits, and monitor activity.

arXiv is a free online library where researchers share pre-publication papers.

📄 LLM Agent Operating System

The paper proposes AIOS, an innovative operating system integrating Large Language Models (LLMs) with agent-based intelligence. Addressing challenges in agent scheduling, context maintenance, and heterogeneous integration, AIOS optimizes resource allocation, enables concurrent agent execution, and ensures access control. The paper introduces AIOS's architecture, emphasizing LLM-specific kernel design and core modules like Agent Scheduler, Context Manager, and Tool Manager. By encapsulating LLM and OS functionalities, AIOS empowers agents to seamlessly combine LLM reasoning with OS-level actions for diverse tasks. The study outlines AIOS's layered structure, from application to hardware layers, highlighting its potential for advancing LLM-based agent development and deployment.

📄 FlashFace: Human Image Personalization with High-fidelity Identity Preservation

FlashFace is a novel approach for human image personalization with high-fidelity identity preservation. Unlike existing methods, FlashFace encodes face identity into a series of feature maps, retaining fine details like scars and tattoos. Additionally, it introduces a disentangled integration strategy, balancing text and image guidance for better instruction following, crucial for scenarios where prompts conflict with reference images. FlashFace's architecture includes a reference network for feature map encoding, separate layers for text and image control signals, and a novel data construction pipeline ensuring variation between reference and target images. With these innovations, FlashFace achieves precise language control and high-fidelity results, demonstrated across various applications like human image customization and face swapping.

📄 Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians

The Octree-GS method introduces a Level-of-Detail (LOD) aware framework to enhance real-time rendering using 3D Gaussian Splatting (3D-GS). Traditional 3D-GS techniques struggle with large scenes containing complex details, leading to inconsistent rendering speeds and inadequate level-of-detail representation. Octree-GS addresses these issues by structuring the scene with hierarchical anchors, dynamically selecting LODs based on observation footprint and scene richness. This approach ensures consistent rendering performance while maintaining high-fidelity results across varying levels of detail. Octree-GS demonstrates superior visual quality and real-time rendering stability compared to existing methods, as validated through experiments on diverse scenes. While Octree-GS enhances detail capture without sacrificing performance, certain aspects like octree construction and progressive training require further refinement for optimal performance.

📄 Improving Text-to-Image Consistency via Automatic Prompt Optimization

The paper introduces OPT2I, a framework addressing challenges in prompt-image consistency of text-to-image (T2I) generative models. Existing methods often require model fine-tuning, focus on nearby prompt samples, and face trade-offs among image quality, diversity, and consistency. OPT2I leverages large language models (LLMs) to iteratively optimize prompts, aiming to maximize consistency scores. It iterates between generating revised prompts and evaluating consistency with a chosen metric, accommodating diverse T2I models and LLMs. Experimental results demonstrate up to a 24.9% improvement in prompt-image consistency without compromising image quality or diversity.

📄 The Unreasonable Ineffectiveness of the Deeper Layers

The paper from FAIR Meta, MIT, and Sequoia Capital presents a novel layer-pruning strategy for large language models (LLMs), aiming to reduce computational resources for inference while maintaining performance. Through empirical analysis, the authors demonstrate that removing a significant fraction of the deeper layers in LLMs has minimal impact on downstream performance, suggesting that current pretraining methods may not effectively leverage parameters in those layers. They propose a simple pruning method based on layer similarity and use parameter-efficient finetuning techniques to mitigate performance degradation. Results indicate that layer pruning can complement other post-training efficiency techniques, reducing memory footprint and inference latency.

AI Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.