AI Breakfast
Posts
Full Recap of NVIDIA GTC 2024

Full Recap of NVIDIA GTC 2024

AI Breakfast
March 20, 2024

Good morning. It’s Wednesday, March 20th.

Did you know: On this day in 2005, Yahoo! acquired Flickr for an estimated $22 million?

In today’s email:

NVIDIA GTC 2024 Highlights
Stability AI's SV3D creates 3D models from single images
OpenAI CEO teases "amazing" new AI model for 2024
Google's MELON simplifies 3D object creation from images
MIT's FeatUp algorithm improves computer vision performance
Developer fits GPT-2 into an Excel spreadsheet for AI education
Denmark's AI supercomputer "Gefion" to boost medical research
Saudi Arabia launches $40B AI fund to drive innovation
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

^{In partnership with}

Guide employees to use GenAI apps securely with in-browser app banners

GenAI apps like ChatGPT are incredibly powerful tools, but many organizations are rightly nervous about data privacy. Rather than block GenAI apps outright and fall behind the competition, Push has released a smart tool designed to secure the use of SaaS applications.

By using Push’s app banners feature, you can create in-browser app banners shown to users when they access GenAI apps. Here it tells them how to use the apps safely and gives them the link to their GenAI security policy.

So before you rush to block employees, think about whether there's a way to meet them halfway and keep risks under control without sacrificing productivity. Push intervenes in the right place, at the right time to reinforce policy at the point of access and prompt secure behavior.

_{Thank you for supporting our sponsors!}

Today’s trending AI news stories

NVIDIA GTC 2024 Highlights

NVIDIA's Shared VR Technology for Apple Vision Pro: NVIDIA's shared VR technology comes to Apple Vision Pro, enabling enterprise developers to stream high-fidelity 3D digital twins. Powered by Omniverse Cloud APIs and NVIDIA Graphics Delivery Network, this eliminates local rendering on the M2 processor, facilitating hybrid rendering and immersive spatial computing experiences. Read more.

NVIDIA's Earth-2 Climate Prediction Platform: NVIDIA launches Earth-2, a cloud-based digital twin for climate modeling. Powered by AI supercomputers, Earth-2 utilizes generative AI and CUDA-X microservices to provide high-resolution simulations, aiding global efforts to understand and mitigate climate change impacts. APIs are already in use by organizations like Taiwan's Central Weather Administration for improved disaster preparedness. Read more.

New AI Chips: NVIDIA's Blackwell GPUs: NVIDIA unveils its next-gen Blackwell GPUs with a massive AI performance leap. The GB200 chip boasts 20 petaflops, while the Grace Blackwell Superchip promises 30x faster LLM inference. Huang calls this a revolution, with Blackwell's six new technologies, powered by CUDA-X, poised to impact sectors from data processing to engineering. Read more.

Project GR00T for Humanoid Robots: NVIDIA announces Project GR00T, a foundational model for humanoid robots, accompanied by the Jetson Thor computer and enhancements to the Isaac robotics platform, advancing AI-driven robotics with natural language understanding, human-like movements and autonomy. Read more.

NVIDIA's NIM Microservices for AI Deployment: NVIDIA introduces NIM microservices for rapid AI deployment across environments. NIMs support Retrieval Augmented Generation (RAG), integrate with vector databases, and can deploy on cloud, Linux servers, or serverless. Developers experiment freely on ai.nvidia.com, with commercial use via NVIDIA AI Enterprise 5.0. Supported by SAP and Adobe, NIMs run on NVIDIA GPUs for broad compatibility. Read more.

NVIDIA Ventures into Quantum-Computing Cloud Services: NVIDIA joins the quantum cloud race, offering a unique AI-powered simulation approach. The NVIDIA Quantum Cloud enables researchers to simulate quantum computers using AI chips, including the H100 Tensor Core GPU. The platform integrates with major cloud providers and aims to offer access to third-party quantum computers. Read more.

NVIDIA and Oracle Partner for Sovereign AI Solutions: CEO Jensen Huang unveiled a strategic partnership with Oracle for global sovereign AI solutions. The collaboration combines Oracle's cloud infrastructure with NVIDIA's accelerated computing, enabling governments and enterprises to establish secure AI factories. Oracle's flexible cloud options and NVIDIA's Grace Blackwell platform ensure performance, efficiency, and data sovereignty. Read more.

NVIDIA Empowers Enterprise AI with Microservices: NVIDIA releases enterprise-grade generative AI microservices, allowing businesses to deploy custom AI applications with full IP control. Powered by the NVIDIA CUDA platform, these include NIM microservices for optimized AI model inference and CUDA-X microservices for data processing, RAG, and HPC. Industry leaders Adobe and SAP integrate NVIDIA microservices for accelerated AI adoption. Expect swift deployment and enhanced customization, available on clouds and NVIDIA-certified systems. Read more.

▶️ Watch the full keynote speech of NVIDIA CEO Jensen Huang here.

Emerging Models

> Build 3D Worlds from a Single Image: Stability AI has unveiled Stable Video 3D (SV3D), a new model that builds on their existing video generation technology. SV3D can create multi-view 3D models from a single image, potentially streamlining asset creation for sectors like gaming and e-commerce. Offering improved depth and consistency compared to previous models, SV3D empowers creators to generate high-quality 3D assets while retaining ownership (IP control). Available for purchase or free for non-commercial use, SV3D promises advantages over prior models, particularly in generating novel views for better 3D mesh creation. Read more.

> OpenAI CEO Teases 'Amazing' New AI Model for 2024: OpenAI CEO Sam Altman has teased an "amazing" new AI model set for release in 2024. While details are scarce, Altman hinted at the possibility of smaller releases before the much-anticipated GPT-5. The Lex Fridman podcast offered further insights into the company's direction, including the potential of a video model named Sora and the continued focus on AI logic with the Q* project. Despite the excitement, Altman expressed a degree of dissatisfaction with recent GPT-4 updates, suggesting the next iteration will have a major leap in performance. Read more.

> Google Researchers Develop 'MELON' for Easier 3D Object Creation: Google Research has unveiled a new method, dubbed 'MELON', that makes it easier to create 3D models of objects from images. The technique addresses the common difficulty of needing precise camera information to reconstruct 3D shapes. MELON uses a streamlined approach to overcome this hurdle, potentially simplifying 3D modeling for various applications. Demonstrated at the 3DV 2024 conference, MELON shows promise in fields like e-commerce and self-driving car technology. Read more.

> MIT Researchers Develop Algorithm for Sharper Computer Vision: MIT researchers have developed a new algorithm, FeatUp, that significantly boosts computer vision performance. FeatUp overcomes the limitations of traditional algorithms by enabling the simultaneous capture of high- and low-level details within images. FeatUp's improvements in object detection and semantic segmentation make it an invaluable tool for tasks like autonomous driving and medical imaging. The algorithm's efficiency and effectiveness position it as a significant breakthrough for computer vision research and applications. Read more.

> AI in Your Spreadsheet: Developer Crams GPT-2 into Excel: Software developer Ishan Anand has miniaturized OpenAI's GPT-2 language model and integrated it into a hefty (1.2 GB) Excel spreadsheet. This ingenious project allows non-developers to explore the inner workings of modern AI, specifically large language models (LLMs), and the Transformer architecture. The entire system operates offline using standard Excel functions, eliminating the need for cloud services or Python. While less user-friendly than a chatbot, Anand's creation strives to demystify complex concepts and foster understanding of AI fundamentals. Supplemented by tutorial videos on his website, this project highlights the potential of Excel for AI experimentation and education. Read more.

Around the World

> Denmark Unveils AI Supercomputer "Gefion" to Boost Medical Research: The Novo Nordisk Foundation has partnered with Eviden to create Gefion, a powerful AI supercomputer powered by Nvidia. Expected to rank among the world's most powerful, Gefion will accelerate breakthroughs in drug discovery, diagnosis, and treatment by processing massive datasets with AI. Located at Denmark's national AI innovation center, it will be accessible to both public and private researchers. Read more.

> Saudi Arabia Announces $40 Billion AI Fund: Saudi Arabia establishes a $40 billion fund dedicated to artificial intelligence (AI) technology, aiming for global leadership. Backed by the Public Investment Fund (PIF), the fund seeks to propel Saudi Arabia to the forefront of AI innovation and investment. Talks with Silicon Valley's Andreessen Horowitz suggest potential collaborations to bolster the initiative. This strategic move emphasizes Saudi Arabia's vision to diversify its economy and become a major tech player. Read more.

5 new AI-powered tools from around the web

Endoftext is an AI-powered prompt editor streamlining prompt creation with suggested edits and test case generation similar to Grammarly for prompts.

Arcads.ai is an AI platform that transforms text into emotionally engaging video ads in minutes. Ideal for brands and agencies seeking cost-effective, bulk ad production with AI actors and multilingual support.

Next Starter AI allows users to launch SaaS in days with all-in-one NextJs Typescript Boilerplate. Includes Stripe/Lemon Squeezy payment integration, marketing/SEO toolkit.

EVM Sandbox is a production-ready Web3 staging environment and enables enterprise-grade development process, CI/CD pipeline setup. Accelerates dApp and smart contract development.

Upscale.media Plugins helps users enhance creative projects with AI-powered Figma, Photoshop, and ChatGPT plugins, delivering quality image upscaling and editing capabilities.

arXiv is a free online library where researchers share pre-publication papers.

📄 TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation

TexDreamer is a novel zero-shot multimodal high-fidelity 3D human texture generation model, bridging the gap in 3D human texture creation. It adapts large text-to-image models to unique UV structures with efficient texture adaptation finetuning and a novel feature translator. The model exhibits faithful identity and clothing texturing using text or images, enabling diverse avatar generation. Additionally, TexDreamer introduces ATLAS, the largest high-resolution 3D human texture dataset, filling the gap in high-quality human UV data. Extensive experiments demonstrate superior text consistency and UV quality. However, limitations exist, particularly regarding clothing pattern alignment in real-life cases. While promising for virtual human industries, TexDreamer raises ethical concerns about potential misuse for creating deepfakes.

📄 PERL: Parameter Efficient Reinforcement Learning from Human Feedback

The Parameter Efficient Reinforcement Learning (PERL) technique, developed by Google researchers, tackles the computational hurdles of Reinforcement Learning from Human Feedback (RLHF) for aligning Large Language Models (LLMs) with human preferences. Leveraging Low-Rank Adaptation (LoRA), PERL achieves performance on par with traditional RLHF while significantly reducing memory usage (by about 50%) and speeding up training (up to 90% faster for reward model training). Extensive experimentation across seven datasets, including two novel ones, underscores PERL's efficacy. Future research directions include exploring broader generalization with ensemble models and investigating techniques to mitigate reward hacking behavior, thereby enhancing both the robustness and ethical considerations of RLHF methods like PERL, developed under the auspices of Google.

📄 MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

The MindEye2 model presents a breakthrough in the field of fMRI-based image reconstruction, enabling high-quality reconstructions of visual perception from brain activity using only 1 hour of training data per subject. MindEye2 employs a novel functional alignment procedure that maps brain data to a shared-subject latent space, facilitating cross-subject generalization and reducing the need for extensive training data. Leveraging pretraining across multiple subjects followed by fine-tuning on minimal data from a new subject, MindEye2 achieves state-of-the-art performance in image retrieval and reconstruction metrics. The model's architecture involves a shared-subject functional alignment step, backbone models, diffusion prior, retrieval and low-level submodules, and fine-tuning of Stable Diffusion XL for image reconstruction. This pioneering approach demonstrates the potential for practical applications in clinical assessments and brain-computer interfaces, with code available on GitHub.

📄 LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

LN3Diff introduces a new framework for efficient 3D diffusion learning, enabling high-quality monocular 3D reconstruction and text-to-3D synthesis. The method employs a 3D-aware architecture and variational autoencoder (VAE) to encode input images into a structured, compact, 3D latent space. A transformer-based decoder then generates high-capacity 3D neural fields. LN3Diff overcomes challenges in scalability, efficiency, and generalizability faced by existing methods. It achieves state-of-the-art performance on ShapeNet, surpassing GAN-based and 3D diffusion-based approaches. The proposed approach demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets, with inference speed three times faster than existing methods. LN3Diff presents a significant advancement in 3D generative modeling with broad applications in 3D vision and graphics tasks.

📄 VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

VideoAgent introduces a memory-augmented multimodal agent for video understanding, addressing challenges in capturing long-term temporal relations in videos. By integrating large language models (LLMs) and vision-language models with a unified memory mechanism, VideoAgent constructs structured memories to store temporal event descriptions and object-centric tracking states. Leveraging the zero-shot tool-use ability of LLMs, it employs interactive tools to solve tasks, demonstrating impressive performances on long-horizon video understanding benchmarks. The approach closes the performance gap between open-sourced models and private counterparts, offering a promising solution for video understanding tasks. VideoAgent's minimalist tool-use pipeline and its ability to achieve comparable or superior results without expensive training make it a valuable contribution to the field. Future directions may include exploration in robotics, manufacturing, and augmented reality applications.

AI Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.