AI Breakfast
Posts
Midjourney Building VR Hardware?

Midjourney Building VR Hardware?

AI Breakfast
February 07, 2024

Sponsored by

Good morning. It’s Wednesday, February 7th.

Did you know: On this day in 1997, Apple Computer completed its acquisition of NeXT?

In today’s email:

Upcoming AI Applications
AI in Content Creation and Generation
AI in Research and Development
5 New AI Tools
Latest AI Research Papers
ChatGPT Creates Comics

You read. We listen. Let us know what you think by replying to this email.

^{In partnership with AE STUDIO}

AI brews beer and your big ideas

What’s your biggest business challenge? Don’t worry about wording it perfectly or describing it just right. Brain dump your description into AE Studio’s new tool and AI will help you solve that work puzzle.

Describe your challenge in three quick questions. Then AI churns out solutions customized to you.

AE Studio exists to solve business problems. They build great products and create custom software, AI and BCI solutions. And they once brewed beer by training AI to instruct a brewmeister and then to market the result. The beer sold out – true story.

Beyond beer, AE Studio’s data scientists, designers and developers have done even more impressive things working 1:1 with founders and executives. They’re a great match for leaders wanting to incorporate AI and just generally deliver outstanding products built with the latest tools and tech.

If you’re done guessing how to solve work problems or have a crazy idea in your back pocket to test out, ask AI Ideas by AE Studio for free solutions, right now.

Today’s trending AI news stories

Upcoming AI Applications

> Midjourney recently hired Ahmad Abbas, who previously worked at Apple Vision Pro, to lead their hardware division. This move hints at Midjourney’s potential expansion into hardware development. Abbas brings experience from his time at Apple, where he worked on mixed reality headsets, and also spent time at Elon Musk’s Neuralink. Midjourney’s founder, David Holz, has extensive hardware expertise from his previous role at Leap Motion. While details about their new project, referred to as “Orb,” are not yet public, speculation suggests it may involve creating AI-generated 3D words and real-time video games. Holz has expressed his vision for a future game console equipped with an AI processor capable of generating games in real-time.

> Roblox has launched a real-time AI translation tool for its platform, enabling instant translation of messages into 16 languages. The goal is to enhance communication among its vast user base of 70 million daily users from 180 countries. Using linguistic similarities between languages, the AI model ensures accurate and rapid translations. Additionally, Roblox intends to provide the underlying translation model to developers, facilitating additional localization efforts within the platform’s interface.

AI in Content Creation and Generation

> Meta is implementing “Imagined with AI” labels on AI-generated images across Facebook, Instagram, and Threads to enhance transparency. These labels, along with invisible watermarks and embedded metadata, aim to distinguish AI-generated content. Meta collaborates with industry leaders like Google, OpenAI, and Microsoft to develop labeling standards. Additionally, Meta explores advanced classifiers to automatically detect AI-generated content.

> Apple has unveiled “MGIE,” an AI model for instruction-based image editing. MGIE utilizes multimodal large language models (MLLMs) to interpret natural language commands for precise image adjustments. Presented at ICLR 2024, MGIE improves metrics and human evaluation while remaining efficient. It generates clear editing instructions and visual representations using MLLMs. With features like Photoshop-style edits and global optimization, MGIE suits various editing needs. Available on GitHub.

> Stability AI has released SVD 1.1, an upgraded version of its Stable Video Diffusion model, enhancing the consistency of AI-generated videos. Available for public download on Hugging Face, SVD 1.1 offers improved motion and quality. It’s accessible through different subscription tiers, with commercial use requiring a membership. SVD 1.1 aims to address previous issues like motion quality and realism, promising better performance. While primarily for research, it’s planned to be integrated into the developer platform. Competing with offerings from Runway and Pika, Stability AI wants to advance generative AI technology with frequent model releases.

> Meta has launched "Prompt Engineering with Llama 2," an interactive guide aimed at developers, researchers, and enthusiasts working with large language models (LLMs). The guide covers various prompt engineering techniques, including explicit instructions, formatting, and few-shot learning, with the goal of enhancing results. It illustrates how to reduce irrelevant tokens in LLM outputs and is accessible through the llama-recipes repository. This resource is valuable for individuals seeking to optimize their utilization of LLMs, aligning with industry trends emphasizing the importance of prompt engineering for better model performance.

AI Applications and Impact

> Microsoft and news startup Semafor team up to create “Signals,” a global news feed powered by AI tools. Signals aims to adapt to changes in digital media, offering different perspectives while human journalists write the content with AI assistance. Financial details remain undisclosed, but the partnership is significant for Semafor’s business. Microsoft plans to partner with other journalism organizations as well. This collaboration addresses concerns about AI’s impact on the news industry, following a copyright lawsuit from The New York Times against Microsoft and OpenAI.

> Scammers in Hong Kong used deepfake technology to trick multinational company’s branch into transferring HK$200 million. They created a fake video meeting featuring digitally altered versions of the company’s CFO and others. This is the first such major scam in Hong Kong. Despite some initial suspicion, an employee was deceived over the course of a week. Police investigations showed that the scammers used publicly available footage to imitate the voices and actions of the participants effectively.

> OpenAI's DALL-E 3, an image generator, will now include watermarks in image metadata, following standards set by the Coalition for Content Provenance and Authenticity (C2PA). These watermarks will be visible on the ChatGPT website and DALL-E 3's API, featuring both invisible metadata and a visible CR symbol. Users can verify image origins using tools like Content Credentials Verify. Despite minor effects on latency and image size, OpenAI stresses the importance of these measures in boosting trust in digital information amidst misinformation concerns.

> Researchers recently won a $700,000 prize for using AI to interpret a 2,000-year-old scroll damaged by Mount Vesuvius' eruption. The Herculaneum papyri, about 800 Greek scrolls carbonized in 79 CE, were challenging to decipher due to their fragile state. Through AI, the team distinguished ink from papyrus and decrypted faint Greek text. Their work has uncovered approximately five percent of the scroll, offering insights into ancient philosophy and literature.

5 new AI-powered tools from around the web

Syncly leverages AI for real-time customer feedback analysis, prioritization, trend visualization, and seamless integration. Ideal for CX and product teams.

Imaginario.ai empowers users with AI-driven video enhancement tools. Instantly search, clip, and transcribe videos with precision.

Color Pop is a high-quality coloring app for Android and iOS, providing a realistic coloring experience. Its advanced AI creates personalized coloring pages from spoken words, attracting creative users.

Runcomfy simplifies setting up ComfyUI workflows for AI-driven image and video creation, catering to artists and creators by automating technical tasks and offering error-free workflows.

Rupert enables no-code data alerts to Slack, streamlining monitoring with AI-assisted message writing and anomaly detection. Ideal for teams seeking polished alerts without technical expertise.

arXiv is a free online library where researchers share pre-publication papers.

📄 Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Video-LaVIT revolutionizes multimodal pre-training, empowering Large Language Models (LLMs) to comprehend and generate videos, images, and text in a unified framework. It introduces a groundbreaking video decomposition scheme, efficiently capturing temporal dynamics through keyframes and motion vectors. This novel approach enables effective tokenization and adaptation to LLMs, facilitating comprehensive generative pre-training across modalities. With a sophisticated detokenizer, Video-LaVIT supports flexible multimodal generation, including long videos. Extensive quantitative and qualitative evaluations validate its understanding and generative capabilities across various benchmarks, marking a significant advancement in multimodal AI research and application.

📄 BlackMamba: Mixture of Experts for State-Space Models

BlackMamba, a novel architecture, merges State-Space Models (SSMs) with Mixture-of-Experts (MoE), combining the benefits of both. SSMs offer linear complexity, enabling long-sequence processing, while MoE reduces inference costs. BlackMamba inherits SSM's linear-complexity generation and MoE's cheap, fast inference. It competes with both SSM and transformer baselines, excelling in FLOPs for inference and training. Two BlackMamba models, 340M/1.5B and 630M/2.8B, are open-sourced. The paper explores architectural combinations, providing insights into routing statistics and initialization techniques. BlackMamba signifies a step forward in efficient language modeling, offering improved performance and scalability over traditional architectures.

📄 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

The paper introduces DeepSeekMath 7B, a model trained with 120B math-related tokens from Common Crawl, achieving 51.7% on the MATH benchmark. The success stems from harnessing web data via a precise selection pipeline and introducing Group Relative Policy Optimization (GRPO), a memory-efficient variant of Proximal Policy Optimization (PPO). DeepSeekMath-Base competes with Minerva 540B on English benchmarks and outperforms on Chinese benchmarks. The study highlights the significance of web data in math pre-training, the impact of code training on reasoning, and GRPO's effectiveness in enhancing mathematical reasoning. Future work includes improving data selection and exploring reinforcement learning avenues for further advancements.

📄 Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Recent advances in large language models have prompted research into superalignment, evaluating and optimizing their capabilities. This paper explores weak-to-strong generalization in vision foundation models, where a weaker model supervises a stronger one to enhance its capabilities. Introducing an adaptive loss function for weak-to-strong supervision, the study conducts comprehensive experiments across scenarios like few-shot learning and transfer learning. Results show superior performance, surpassing strong-to-strong generalization and fine-tuning with whole datasets. Weak-to-strong generalization emerges as a promising approach to substantially elevate vision foundation model performance.

📄 SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures

SELF-DISCOVER, a breakthrough framework developed by researchers at Google DeepMind, empowers Large Language Models (LLMs) to autonomously craft task-specific reasoning structures. By selecting and composing atomic reasoning modules, LLMs achieve remarkable performance boosts, up to 32% compared to traditional prompting methods like Chain of Thought (CoT). This framework substantially enhances LLMs' ability to tackle challenging reasoning benchmarks such as BigBench-Hard and MATH, outperforming inference-intensive methods while requiring significantly less compute. SELF-DISCOVER's self-discovered reasoning structures are universally applicable across different LLMs, showcasing the potential for structured reasoning to advance problem-solving capabilities and foster Human-AI collaboration.

ChatGPT Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.