- AI Breakfast
- Posts
- AI Music Generation with Copilot
AI Music Generation with Copilot
Good morning. It’s Wednesday, December 20th.
In Partnership with Mojju Custom GPTs
Did you know: On this day in 1996, Apple Computer announced its acquisition of NeXT Software?
In today’s email:
AI in Business and Industry Innovations
AI Safety and Ethical Use
AI Recognition and Impact on Society
9 New AI Tools
Latest AI Research Papers
ChatGPT Creates Comics
You read. We listen. Let us know what you think by replying to this email.
Interested in reaching 47,060 smart readers like you? To become an AI Breakfast sponsor, apply here.
Today’s trending AI news stories
AI in Business and Industry Innovations
> Microsoft’s AI Copilot, integrated with the GenAI music app Suno, now boasts a music creation feature. This enhancement enables users to compose songs through simple prompts, expanding Copilot’s capabilities significantly. The integration highlights the growing intersection of AI and creative arts, showcasing the potential of generative AI in music generation. The development marks an important step in Microsoft’s AI endeavors, blending technology with creativity to offer innovative music composition tools.
> Midjourney Dev Team invites paying members to assist in pre-releasing Version 6 of Midjourney by rating images at their website. The process involves selecting images based on personal aesthetics and reporting any NSFW content. These images are designed to enhance the system’s learning and are not representative of the final V6 release, focusing on improvements like prompt understanding and world knowledge.
> Stability AI, known for its Stable Diffusion text-to-image model, introduces a paid membership for commercial use of its AI models. The company offers three tiers: a free option for personal/research use, a $20/month subscription for smaller entities, and an enterprise plan, all providing early access to new models. Only paid members can use the models commercially. This move marks a shift in Stability AI’s approach, balancing profitability with openness. CEO Emad Mostaque assures continued release of open models but faces community confusion over the scope of ‘commercial use.’
> Mistral AI, recently funded $415 million and valued at $2 billion, aims to launch an open-source GPT-4 level model in 2024. Their current model, 8X7B, surpasses benchmarks of similar AI models with faster inference. Amidst this, OpenAI is rumored to release GPT-4.5 soon. MIstral AI’s upcoming Mistral Medium model, proficient in multiple languages, demonstrates high benchmark scores. They also introduced ‘La Plateforme’ for API access to their models, addressing sustainability in open-source AI.
> Playground AI has launched Playground V2, a new diffusion-based text-to-image model, outperforming Stable Diffusion XL in user preference by 2.5 times. Available on HuggingFace, it offers basic and aesthetic versions with resolutions up to 1024px. Users can generate 500 free daily images on Playground’s website, with commercial usage allowed. The model is part of Playground’s broader goal to develop advanced visual AI systems, including creating 3D environments and analyzing video scenes.
> IBM plans to acquire Software AG’s AI and cloud assets for $2.3 billion to enhance its AI and hybrid cloud capabilities. This acquisition includes StreamSets and webMethods platforms, and is expected to complete in 2024. The deal aligns with IBM’s strategic focus on AI and cloud growth, and is backed by Silver Lake, which owns 93.3% of Software AG.
> Research indicates that Google’s Gemini Pro falls short in performance compared to OpenAI’s older GPT-3.5 Turbo across most tasks. Despite long development, Gemini Pro’s capability lags behind not only GPT-3.5 but also OpenAI’s advanced GPT-4 models. Gemini Pro, however, excels in translation tasks across certain languages but shows a strong content moderation tendency, blocking responses in several language pairs. The findings suggest that while Google is a significant player in AI, its latest generative AI offering still trails behind OpenAI’s established models.
AI Safety and Ethical Use
> OpenAI introduces a “Preparedness Framework” for AI safety, involving an advisory group to guide safe AI model development. The plan allows the board to reverse safety decisions and includes multiple teams overseeing AI safety, with a focus on mitigating risks like catastrophic economic losses or fatalities. The framework assesses AI models for potential hazards, ensuring only low-risk models are released.
OpenAI is also bolstering its safety protocols against the risks of AI misuse. A new safety advisory group, separate from the technical teams, will provide recommendations to OpenAI’s leadership. Additionally, the board has been given veto power over decisions involving high-risk AI developments. This restructuring aims to address potential catastrophic risks and comes after recent leadership changes and growing discourse on AI safety.
AI Recognition and Impact on Society
> ‘Nature’ journal has recognized ChatGPT as one of its 2023 scientists of the year, alongside its co-creator Ilya Sutskever. ChatGPT has contributed to academia by writing papers, summarizing articles, and aiding research grant applications, sparking discussions on AI’s limits and need for regulation. Sutskever, a pioneer in generative AI, is also honored for his efforts in directing and controlling AI systems.
> Rite Aid, a bankrupt U.S. pharmacy chain, will stop using AI-based facial recognition technology for five years to settle Federal Trade Commission charges of consumer harm. From 2012 to 2020, Rite Aid used the technology to identify shoplifters but falsely flagged some consumers. The settlement awaits bankruptcy court approval. This FTC action followed a 2020 investigation, often neighborhood-biased deployment of facial recognition. The incident underscores the challenges and risks of implementing AI in customer service without proper configuration, testing, and controlled speech.
In partnership with MOJJU
Custom GPTs from Mojju will save hundreds of hours of your time and make you more productive, creative and effective!
Mojju offers unique and powerful custom GPTs for OpenAI. Their portfolio includes a diverse range of GPTs including productivity tools, various assistants & guides, business & finance tools, and a lot more! All GPTs are free to use!
Crafted by a skilled AI team, Mojju offers a range of proven solutions, including GPTs integrated with Zapier, MailChimp and Stable Diffusion. Continuous support and updates are provided by Mojju’s team!
Unlike other products in the market that tend to aggregate all available GPTs, often leading to clutter and confusion, Mojju takes a different approach. Their library consists of reliable and tested GPTs developed by our in-house team. Mojju Team’s goal is to maximize the benefits of the emerging trend, ensuring users receive the utmost value from their efforts.
Thank you for supporting out sponsors!
9 new AI-powered tools from around the web
Dewstack helps you craft, manage, and host intelligent documents for various needs like user manuals, faqs, knowledge bases, product docs and lots more.
Pawtrait Studio creates whimsical human portraits of your dog, cat, rabbit, guinea pig & hamster and transforms your photos into your pet alto-ego.
CapGo.AI revolutionizes market research with ultra-fast data extraction into spreadsheets, offering unparalleled speed, efficiency, and simplicity through a user-friendly interface, ideal for marketers and analysts.
Pixplain by Merlin AI transforms web browsing with a simple Chrome extension, offering instant AI analysis of screenshots for enhanced comprehension, making complex content accessible with a single click, and redefining user interaction with webpages.
Sketch2App uses GPT-4 Vision to convert hand-drawn sketches into code for React, Next, React Native, and Flutter. It streamlines app development by enabling fast iterations and customizations via text prompts, enhancing workflow efficiency.
Creatify AI quickly generates engaging video ads from product URLs, offering an AI-driven alternative to traditional agencies. It’s ideal for social media and advertising, streamlining video creation with advanced AI tools.
Korus.co, founded by Deadmau5, is an AI-powered platform revolutionizing music creation, enabling users to play, create, and remix music from iconic labels. It’s an innovative blend of technology and music for enthusiasts.
Autonoma automates code documentation like Confluence, creating wikis with API Docs and feature descriptions from any codebase. It’s ideal for startups and enterprises handling legacy or complex code, ensuring easier understanding and maintenance.
DeepMake is an AI video effects software offering VFX, stock video from text, and layer segmentation. It features video generation, face swapping, upscaling, and integrates with Adobe After Effects and Nuke.
arXiv is a free online library where researchers share pre-publication papers.
This paper by researchers from Apple presents a method for efficiently running LLMs on devices with limited DRAM capacity. It addresses the challenge of running LLMs that exceed available DRAM by storing model parameters on a flash memory and accessing them as needed. The method involves two key techniques: “windowing” to reduce data transfer by reusing neuron activations, and “row-column bundling” to increase data chunk sizes read from flash, leveraging its sequential access strengths. This approach enables running models twice the size of available DRAM, achieving a 4-5x increase in inference speed on CPUs and 20-25x on GPUs, compared to traditional loading methods. The integration of sparsity awareness and context-adaptive loading in a hardware-oriented design marks a significant advancement in LLM inference on memory-constrained devices.
This NVIDIA paper presents GAvatar, utilizing Gaussian splatting for 3D avatar generation from text. It addresses limitations of mesh and NeRF models with primitive-based 3D Gaussian approach, enhancing animation and mesh extraction. The method stabilizes learning of millions of Gaussians, incorporates neural implicit fields for Gaussian attributes, and employs SDF-based implicit mesh learning. This approach yields high-quality textured meshes, significantly improving avatar appearance and geometry, and achieves rapid rendering at 100 fps in 1K resolution.
Amphion is an open-source toolkit for audio, music, and speech generation. It is designed to support reproducible research and assist newcomers in the field. Key features include visualizations of classic models for better understanding and a unified framework for various generation tasks. Amphion includes several vocoders and evaluation metrics, focusing on converting various inputs into general audio. It supports tasks like text-to-speech, singing voice conversion, and text-to-audio generation. Amphion’s design integrates data processing, common modules, and optimization algorithms, with specific architectures and training pipelines for each generation task. The toolkit’s version 0.1 supports models like FastSpeech 2 and VITS for TTS, and includes visualization and diffusion-based SVC models. Amphion stands out for its unified framework, beginner-friendly approach, and educational visualizations.
VistaLLM, developed by researchers from John Hopkins University and Meta, is a groundbreaking vision-language model capable of handling both coarse and fine-grained tasks over single and multiple images. It uniquely integrates segmentation and multi-image inputs into a single framework using an instruction-guided image tokenizer for feature extraction and a gradient-aware adaptive sampling technique for efficient binary mask representation. The model is trained on the comprehensive CoinIt dataset with 6.8M samples, including the novel AttCoSeg task for enhanced reasoning over multiple images. VistaLLM consistently outperforms existing models across various vision-language benchmarks, demonstrating its versatility and efficiency.
The paper proposes an innovative NeRF framework that challenges the need for high-quality meshes in photorealistic rendering. MixRT integrates a low-quality mesh, a view-dependent displacement map, and a compressed NeRF model. This approach utilizes existing graphics hardware to enable real-time rendering on edge devices. Demonstrating superior performance, MixRT achieves over 30 FPS at 1280x720 resolution on a Macbook M1 Pro, with enhanced rendering quality and reduced storage size compared to state-of-the-art methods.
ChatGPT + DALLE-3 Writes Comics
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.