AI Breakfast
Posts
GPT-4.5 Rumor Squashed, Midjourney Coming To The Masses

GPT-4.5 Rumor Squashed, Midjourney Coming To The Masses

AI Breakfast
December 15, 2023

Good morning. It’s Friday, December 15th.

In partnership with AI Adventure

Did you know: On this day in 1994, early web browser Netscape Navigator 1.0 was released?

In today’s email:

AI Developments in Media and Robotics
Latest from OpenAI
7 New AI Tools
Latest AI Research Papers
ChatGPT + DALLE 3 Creates Comics

You read. We listen. Let us know what you think by replying to this email.

Interested in reaching 46,863 smart readers like you? To become an AI Breakfast sponsor, apply here.

Today’s trending AI news stories

AI Developments in Media and Robotics

> Midjourney introduces an alpha web version for AI image generation, shifting from its original Discord platform. This new version, accessible to users who have generated over 10,000 images, features an easy-to-use interface with sliders and tags for adjusting parameters like aspect ratio and style. It also allows for easy image insertion into prompts. The upcoming v6 version, expected to better follow prompts, is yet to be announced, contrasting with OpenAI's DALL-E 3's current capabilities.

> Stability AI introduces Stable Zero123, an enhanced model for creating 3D models from text and images. This model improves upon its predecessor, Zero123-XL, by using a refined training dataset from Objaverse, enhancing results and training efficiency by 40 times. Stable Zero123 generates panoramic images from various angles of an object, which can be used to create a 3D model. Released for research, not commercial use, it requires three studio frameworks and 24 GB of VRAM for 3D generation.

> DeepMind's AI FunSearch, as reported in Nature, has made significant advancements in combinatorics by solving Set game-inspired problems, surpassing human mathematicians. This AI employs large language models (LLMs) to auto-generate and iteratively refine computer programs, crafting solutions previously unknown in mathematics and computer science. FunSearch's achievement in improving the lower bound for n=8 in the cap set problem marks a novel use of LLMs in creative problem-solving, highlighting a new era of AI-enhanced human-machine collaboration in mathematical research.

> Tesla's latest humanoid robot prototype, Optimus Gen 2, showcases notable improvements in a new demo video. It is 30% faster, 10 kg lighter, and features advanced sensors for tactile sensing on all fingers. The robot demonstrates abilities like walking, crouching, and handling delicate items. Tesla confirms the video's realism, with no alterations, emphasizing real-time capabilities. Optimus Gen 2 represents Tesla's ambition to develop robots capable of supporting and eventually replacing human labor.

Latest from OpenAI

> OpenAI has resumed accepting new subscribers for ChatGPT Plus, announced by CEO Sam Altman. This follows a temporary pause due to a lack of GPU capacity exacerbated by the high demand after introducing GPT 4 Turbo and other new features. ChatGPT Plus offers access to advanced features like GPT-4-Turbo, DALL-E 3, and a code interpreter. The company also addresses recent concerns about GPT-4's performance. ChatGPT's web traffic is recovering, nearing its peak levels in April 2023, with increased use in school settings and overall product improvements.

> OpenAI’s Superalignment team, led by Chief Scientist Ilya Sutskever, has made strides in developing methods to manage super-intelligent AI. Their recent paper highlights an experiment where a lesser AI model successfully guides a smarter one, similar to human-AI future interactions. This research is crucial for safely harnessing AI’s growing capabilities and forms a foundational step in ensuring AI’s beneficial alignment with human objectives.

> OpenAI's CEO Sam Altman has dismissed a recent leak regarding GPT-4.5 pricing and capabilities. The leak, which spread on X and Reddit, claimed GPT-4.5 would bring multi-modal capabilities across various domains, including vision, video, audio, language, and 3D, along with complex reasoning and cross-modal understanding. It also detailed a new pricing model for the update. However, the authenticity of this information is now in question following Altman's brief but direct dismissal of the leak as inaccurate.

> Axel Springer has formed a unique global partnership with OpenAI, integrating real-time news searches into ChatGPT. This collaboration enhances ChatGPT's user experience by providing access to content from Springer's media brands like Politico and Business Insider. ChatGPT will offer summaries of global news, including paid articles, with proper attribution.

^{In partnership with AI ADVENTURE}

Turn your favorite stories into adventure games… free!

AI Adventure is an open-source gaming system for creating choose-your-own adventure stories.

Play and create your own games or jump into a community-made game with characters, goals, and graphics. Aided by the storytelling power of AI.

7 new AI-powered tools from around the web

Audio Note, transforms spoken ideas into structured text, offering formats like journal entries and tweets.

Ask Viable converts feedback into PRDs, FAQs, and reports swiftly, offering the first report for free. It uses AI for customized business analysis, ideal for reducing churn and prioritizing features.

PriceGPT offers analysis of pricing pages, providing actionable insights and recommendations from a URL or image upload. A user-friendly tool for competitive edge, it’s ideal for SaaS and AI applications with a focus on simplicity and efficiency.

Scade.pro, a no-code AI platform featuring over 1500 AI tools, makes automating business processes and developing AI-based products as easy as crafting a PowerPoint.

MessengerX.io, an AI tool that allows you to innovate and earn with custom, uncensored GPTs for engaging chats. It is developer-friendly with SDK/APIs for website and app integration.

Music AI is a premier platform for developing audio-driven AI products, offering an extensive range of diverse, in-house and third-party AI models for advanced audio solutions like track isolation, transcription, and noise suppression.

Excalidraw is an online drawing tool enabling real-time collaboration with a variety of shapes, library items, and canvas actions. Features Zen mode and intuitive canvas navigation for enhanced creative experiences.

arXiv is a free online library where researchers share pre-publication papers.

📄 Foundation Models in Robotics: Applications, Challenges, and the Future

This paper examines the application of foundation models in robotics, focusing on pre-trained models like BERT, GPT-3, GPT-4, CLIP, DALL-E, and PaLM-E. Adapted for a range of tasks including autonomous driving and medical robotics, these models leverage extensive and diverse data for enhanced adaptability and performance. The integration of multimodal foundation models, with their zero-shot capabilities, is set to revolutionize robotics by fusing sensor data and enhancing adaptability in unstructured environments. The paper also highlights ongoing challenges in data scarcity, variability, uncertainty quantification, safety evaluation, and real-time performance, pointing out future research directions in the field.

📄 Distributed Inference and Fine-tuning of Large Language Models Over The Internet

This study introduces PETALS, a novel decentralized system for running large language models (LLMs) over the Internet. It addresses the challenges of using 50B+ parameter models by employing fault-tolerant, distributed autoregressive inference algorithms and decentralized load-balancing. PETALS allows efficient LLM operation on geodistributed, consumer-grade networks, significantly outperforming existing parameter offloading methods. Tested with Llama 2 (70B) and BLOOM (176B), the system shows enhanced performance in various network conditions, including a real-world, cross-continental setup.

📄 SWITCHHEAD: Accelerating Transfomers with Mixture-of-Experts Attention

The paper introduces SwitchHead, a novel MoE-based attention mechanism enhancing Transformers. It drastically reduces compute and memory demands, offering wall-clock speedups while maintaining performance. By selectively using MoE layers for value and output projections, SwitchHead reduces attention matrices significantly. Combined with MoE MLP layers, it forms a fully MoE-based "SwitchAll" model, demonstrating efficiency and scalability. This method addresses the computational challenges in training large language models, making it accessible to a wider research community.

📄 STEMGEN: A Music Generation Model That Listens

STEMGEN, developed by researchers from ByteDance, introduces a groundbreaking music generation model. It uniquely listens and adapts to musical context, leveraging a non-autoregressive, transformer-based design. Enhanced with innovative architectural and sampling advancements, STEMGEN is trained on diverse datasets. It achieves high audio quality and musical coherence, surpassing standard benchmarks. The model is evaluated using traditional quality metrics and advanced music information retrieval methods, making it a cutting-edge tool for dynamic, context-aware music creation.

📄 SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

SHAP-Editor innovates 3D editing with a feed- framework in seconds. Bypassing lengthy distillation processes, it uses a non-autoregressive, transformer-based architecture to encode 3D objects in a latent space, allowing rapid, efficient edits from simple instructions. Excelling in both generalization and performance, SHAP-EDITOR represents a significant leap in practical, scalable, 3D asset editing.

ChatGPT + DALLE 3 Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.