AI Breakfast
Posts
Midjourney To Introduce Personal Models, v7 Upgrade Soon

Midjourney To Introduce Personal Models, v7 Upgrade Soon

AI Breakfast
April 01, 2024

Good morning. It’s Monday, April 1st.

Did you know: On this day in 1976, Steve Jobs, Steve Wozniak, and Ronald Wayne founded the Apple Computer Company?

In today’s email:

Midjourney v7: Custom models, Video
MS & OpenAI: Stargate AI Supercomputer
China: New 14nm AI Chip Avoids Sanctions
Brown U: Tiny Brain-like Sensors
OpenAI: 15s Voice Cloning, Release Delayed
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories

> Midjourney Set To Introduce Personalized Models, v7 Upgrade Coming Soon: Midjourney revs up its AI art engine with personalized models in v7 (due in 3 months). Founder David Holz promises better image quality, faster prompt understanding, and video generation by year-end. The key feature: user-specific models. By analyzing preferences from image ratings, Midjourney aims to tailor image creation and overcome model biases. Version 7 also boasts faster processing and improved aesthetics. Looking ahead, Holz hints at video and potentially 3D model generation, acknowledging the need to improve prompt understanding, an area where competitors like DALL-E and Ideogram currently lead. Read more.

> Report: Microsoft to build ‘Stargate’ supercomputer with millions of chips for OpenAI: Microsoft and OpenAI are partnering on a multi-billion dollar project to build a massive supercomputer, codenamed "Stargate". This machine, featuring millions of processors, will integrate into a larger network of AI clusters for accelerated research. Microsoft's collaboration with OpenAI on infrastructure dates back to 2020, with an Azure-based supercomputer initially equipped with 10,000 graphics cards. This system has since received upgrades and now boasts tens of thousands of cutting-edge A100 chips. Stargate, the pinnacle of a five-phase project, is projected to be operational by 2028. Read more.

> Chinese chipmaker launches 14nm AI processor that's 90% cheaper than GPUs — $140 chip's older node sidesteps US sanctions: Chinese chipmaker Intellifusion sidesteps US sanctions with "DeepEyes" AI boxes. These cost-effective devices leverage older 14nm tech and custom chips (including the NNP400T neural network chip) to deliver powerful AI processing at a fraction of the cost of GPUs (around $140 for the first model with 48 TOPS). This move aims to democratize AI access, especially for businesses burdened by traditional training costs. DeepEyes positions Intellifusion as a competitor in the global AI market, showcasing China's strategic response to sanctions. Read more.

> Salt-Sized Sensors Mimic the Brain: Researchers at Brown University have made a significant leap in brain-machine interface technology with the development of miniature, brain-inspired sensors. These silicon chips, no larger than a grain of salt (measuring 300 by 300 micrometers), mimic the communication style of neurons by transmitting data as brief bursts or "spikes," only when they detect events. This efficient approach, similar to the way the brain processes information, conserves energy and allows for a scalable system. Decoding these signals happens in real-time using neuromorphic computing techniques, a field inspired by artificial intelligence. These tiny sensors have applications beyond brain-computer interfaces, with the potential to monitor various physiological activities. Read more.

> OpenAI's Voice Engine can clone your voice from a 15-second sample: OpenAI's new AI model, Voice Engine, can create realistic voice replications from a mere 15-second audio sample and text input. Though the technology holds promise for applications like educational tools and communication aids for those with speech limitations, OpenAI is delaying a wider release due to concerns about potential misuse, particularly during elections. Despite ongoing development since late 2022, with Voice Engine already integrated into existing OpenAI tools, the company claims to be prioritizing safety measures. However, you don’t have to wait until OpenAI lets you clone your voice, you can always go to ElevenLabs and use their similar suite of voice cloning tools.

🖇️ Etcetera

> Google AI Introduces AutoBNN: A New Open-Source Machine Learning Framework for Building Sophisticated Time Series Prediction Models (More)

> NICE announces brings contextual memory to contact center AI (More)

> New York City will introduce controversial AI gun detection technology amid subway crime crisis (More)

> AI startups Scale AI and Cohere reportedly in talks to raise hundreds of millions (More)

5 new AI-powered tools from around the web

LM Studio enables local execution of LLMs offline, ensuring privacy and security. It supports diverse models like LLaMa, Falcon, MPT, Gemma, and those from Hugging Face repositories. Users engage via in-app Chat UI or local server compatible with OpenAI APIs.

Breadcrumb.ai is an AI-driven analytics platform for personalized interactive data views without need of advanced data skills. Ideal for quick insights turning data into narrative-form presentations, and scaling up the service for multiple clients with ease.

Prototyper utilizes AI to convert text or screenshots into functional UI code, speeding up design iteration and collaboration for teams.

Faune integrates language learning models like GPT-3, GPT-4, and Mistral into an AI chat app. Dynamic prompts, image processing, and flexible credit system ensure engaging, private conversations.

Salieri’s Multiverse is an AI-powered sandbox for crafting interactive adventures. Utilizing multi-agent LLM, it transforms ideas into rich narratives and visuals.

arXiv is a free online library where researchers share pre-publication papers.

📄 ReALM: Reference Resolution As Language Modeling

ReALM, developed by Apple harnesses the power of Large Language Models (LLMs) to tackle reference resolution across conversational, on-screen, and background contexts. It achieves performance levels akin to GPT-4 while utilizing fewer parameters. The approach involves encoding entities as natural language, preserving spatial relationships on-screen to facilitate robust resolution. While demonstrating efficiency, it acknowledges the complexity of handling nuanced queries and suggests further exploration into advanced techniques such as grid-based spatial encoding. This innovative solution signifies a significant leap forward in AI-driven reference resolution, offering practical and on-device capabilities without compromising performance. With its ability to navigate diverse contextual scenarios, ReALM stands as a promising tool in enhancing user experiences and interaction with AI systems.

📄 Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

This paper introduces the Unsolvable Problem Detection (UPD) challenge for Vision Language Models (VLMs), assessing their ability to withhold answers when faced with unsolvable Visual Question Answering (VQA) tasks. UPD encompasses three settings: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Incompatible Visual Question Detection (IVQD). Extensive experiments on recent VLMs like GPT-4V and LLaVA-Next-34B reveal varying levels of struggle with UPD, highlighting room for improvement. The study explores both training-free and training-based solutions, such as prompt engineering and instruction tuning, to address UPD limitations. Results underscore the complexity of UPD and advocate for innovative approaches in future VLM research.

📄 Gecko: Versatile Text Embeddings Distilled from Large Language Models

This paper introduces Gecko, a compact and versatile text embedding model distilled from large language models (LLMs). Gecko achieves robust retrieval performance by distilling knowledge from LLMs into a retriever. The distillation process involves generating diverse, synthetic paired data using an LLM and refining data quality by retrieving candidate passages for each query and relabeling positive and hard negative passages using the same LLM. The effectiveness of Gecko is demonstrated on the Massive Text Embedding Benchmark (MTEB), where it outperforms existing entries with significantly smaller embedding dimensions. By leveraging LLMs, Gecko offers strong performance, competing with much larger models. The paper highlights the importance of LLMs in improving text embedding models and underscores the versatility and efficiency of Gecko in various downstream tasks.

📄 Localizing Paragraph Memorization in Language Models

In this collaborative effort between researchers from Google and ETH Zurich, they explore the intricate process of localizing paragraph memorization within language models. Their study delves into whether it's feasible to pinpoint the specific weights and mechanisms responsible for a language model's ability to memorize and regurgitate entire paragraphs from its training data. Through meticulous analysis, they reveal that while memorization is dispersed across various layers and components of the model, there exists a discernible spatial pattern in the gradients of memorized paragraphs, particularly pronounced in lower layers. Moreover, they identify a specific attention head, predominantly active in memorizing paragraphs, which exhibits a proclivity for rare tokens in the input. Perturbation experiments shed light on the impact of individual tokens on the model's generation, providing valuable insights into the localization and potential mitigation of memorization effects in language models.

📄 Are We on the Right Way for Evaluating Large Vision-Language Models?

The paper investigates current evaluation methods for Large Vision-Language Models (LVLMs) and identifies two primary issues. Firstly, many evaluation samples lack essential visual content, leading to assessments that primarily test the textual capabilities of LVLMs. Secondly, unintentional data leakage occurs during LVLM training, affecting the model's performance on visual-necessary questions. To address these issues, the authors introduce MMStar, an elite vision-indispensable multi-modal benchmark comprising 1,500 meticulously selected samples. MMStar covers six core capabilities and 18 detailed axes, aiming to evaluate LVLMs' multi-modal capacities accurately. Additionally, two metrics, multi-modal gain (MG) and multi-modal leakage (ML), are proposed to measure LVLMs' actual performance gain and data leakage during training. Evaluation of 16 LVLMs on MMStar reveals insights into their multi-modal capabilities and data leakage tendencies, highlighting the need for more rigorous evaluation methods in LVLM research.

AI Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.