- AI Breakfast
- Posts
- Meta's AI Voice "Too Dangerous" To Release
Meta's AI Voice "Too Dangerous" To Release
Good morning. It’s Monday, June 19th.
Did you know: A large portion of ChatGPT’s original training data came from Reddit’s comment section, where OpenAI scraped all posts containing at least 3 upvotes to include in their training corpus? OpenAI allegedly did this without Reddit’s permission or awareness, and now the forum giant is seeking to change that.
In today’s email:
Meta Says New AI Voice Model Too Dangerous to Release
EU AI Act Summary
Latest AI News Stories (Can we use AI to talk to animals?)
5 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.
Today’s edition is brought to you by:
Scribe is your GPT-4-powered process documentation platform that automatically creates SOPs, help centers, new user guides and process overviews for any business process.
Scribe AI auto-generates step-by-step guides complete with screenshots and text by capturing your screen while you click and type.
Using your guides, Scribe AI can create full process documentation (including headings, subheadings, and detailed text) with your guides automatically embedded. No more staring at a blank document thinking, "ok, I have to teach someone how to do this, where do I start? what are all the steps".
Today’s trending AI news stories
Meta Says New AI Voice Model Too Dangerous to Release
“There are many exciting use cases for generative speech models, but because of the potential risks of misuse, we are not making the Voicebox model or code publicly available at this time.” -Meta
Meta announced the development of a new AI audio tool, Voicebox, designed for generating and editing spoken dialogue.
Unlike previous voice generator platforms, Voicebox has the unique ability to perform speech generation tasks that it was not explicitly trained on. When a user inputs text and provides a short audio clip as context, Voicebox can clone the sample into new speech that closely resembles the voice featured in the source clip.
However, the company has deemed it not ready for public release due to potential for abuse. This may be in anticipated response to the EU’s AI Act, which holds social media companies more responsible for content generated by users on their platform, or with their software tools.
Key Features of Voicebox:
It’s a speech generative model based on Meta's non-autoregressive flow matching model
It excels in various speech tasks by leveraging in-context learning through a text-guided speech infilling task with a large amount of data
Voicebox surpasses single-language AI models and synthesizes speech across six languages
It has the capability to remove transient noise, edit content, and transfer audio style both within and across languages
Compared to state-of-the-art auto-regressive models, Voicebox generates speech up to 20 times faster, significantly improving efficiency
The technology that Voicebox provides will likely integrate seamlessly into Meta’s suite of social media apps, VR hardware, and messaging services when released.
The AI Act in Europe received significant backing in the European Parliament and is poised to be a key step in global AI regulation after revisions from the European Commission and the Council of the EU.
The Key Takeaways of Europe’s AI Act:
Ban on emotion-recognition AI in policing, schools, and workplaces
Ban on real-time biometrics and predictive policing in public spaces
Ban on social scoring by public agencies
AI cannot be trained on Copyright material
Social media companies more responsible for user-generated content
Intel Doubles Down on Chip Manufacturing: Intel announced its plan to invest $4.6 billion in a new chip plant in Poland, as part of its broader strategy to expand chip capacity in Europe and regain its competitive position in the semiconductor industry against rivals like AMD, Nvidia, and Samsung.
China Welcomes Microsoft: China's President Xi welcomed U.S. AI technology in a discussion with Bill Gates about AI and Microsoft's business development in China, amidst strained U.S.-China relations and a tightened grip on the internet sector in China.
The Grammy Goes to AI: The Recording Academy announced new Grammy Awards rules allowing AI contributions to be nominated, but requiring evidence of substantial human involvement in the creation of the song.
AI as a New Class of Security Threat: The rise of generative AI is escalating security threats, with hackers using AI to enhance phishing and fraud techniques, automated social engineering attacks, and more efficient generation of malicious code, causing a loss of social trust.
Military Applications of AI: The U.S. Department of Defense emphasized its commitment to the “ethical use of AI” and promoting cooperation in military AI applications, adopting principles similar to NATO's and issuing a Political Declaration of Responsible Military Use of AI and Autonomy.
MIT Enhancing Autopilot for Flying: Israeli scientists at MIT have developed an innovative AI autopilot algorithm that enhances the stability of aircraft during potentially fatal near-crash situations, according to a non-peer-reviewed study.
Minting AI Viruses: MIT researchers have warned about the risks of AI enabling the creation of dangerous new viruses, as demonstrated by an AI chatbot guiding non-experts to identify pandemic-causing pathogens and methods to deceive DNA synthesis companies.
Quantum Computing: Intel is gearing up to ship its new 12-qubit quantum processor to academic research labs. Have you tried turning it off and on again at the same time?
Using AI To Talk To Animals: Scientists are using machine learning to decode animal communication, with the Earth Sciences Project leading this revolutionary approach to understanding the vocal cues of beluga whales, potentially transforming ethology and animal welfare initiatives.
Black Mirror’s AI Deepfake Episode: The premiere episode of Black Mirror's sixth season, “Joan Is Awful,” presents a darkly comedic yet pertinent exploration of the contemporary anxieties about AI and our digital rights.
🎧 Did you know AI Breakfast has a podcast read by a human? Join AI Breakfast team member Luke (an actual AI researcher!) as he breaks down the week’s AI news, tools, and research: Listen here
5 new AI-powered tools from around the web
QR Craft is a tool that transforms ordinary QR codes into art pieces. It generates PNG QR codes of 768x768 pixel resolution within 30 seconds and provides an API for integration into other applications.
Released is a software that helps users create visually appealing release notes from their Jira tickets. It offers AI copywriting, post-categorization, customizable widgets, and issue tracking, among other features. Users can embed these notes in applications or websites via a convenient widget.
Fix My Code is a coding assistant that specializes in digital accessibility and ADA compliance. It generates accessible code and provides clear explanations, ensuring that projects meet accessibility standards and improve user satisfaction.
Deepchecks (Github) is an open-source tool for AI and ML validation that offers testing, CI, testing management, and monitoring components. It supports tabular, NLP, and computer vision validation and provides customizable checks, visual reports, and a dynamic UI for collaboration. Some premium features are available under a commercial license.
Prepfully's Peer Interview is a free interview preparation platform offering unlimited peer interview sessions. It provides flexible scheduling and covers various interview topics, matching users based on their target company role and experience.
arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.
DreamHuman is an innovative research project that has unveiled a remarkable advancement in the generation of lifelike and animatable 3D human avatars solely based on textual descriptions. Leveraging a combination of text-to-image synthesis models, neural radiance fields, and statistical human body models, DreamHuman introduces a new approach to producing dynamic 3D avatars with exceptional visual quality.
The paper presents a novel diffusion-based architecture called TryOnDiffusion for image-based virtual try-on. The proposed method generates realistic garment details and accommodates significant body pose and shape changes. The approach achieves superior performance in terms of resolution, garment preservation, and pose variation handling. The architecture is evaluated quantitatively and qualitatively, surpassing recent methods in user study.
VideoComposer introduces innovative techniques for improved control in video synthesis. This approach enables users to compose high-quality videos while maintaining temporal consistency. VideoComposer empowers creators with precise control over their videos. Leveraging a latent diffusion model, motion vectors from compressed videos, and a Spatio-Temporal Condition encoder, this technology offers new possibilities in automated content creation. Extensive experiments demonstrate VideoComposer’s effectiveness and creativity across various generative tasks. Code and models are publicly available.
Jumanji is an open-source suite of fast, flexible, and scalable reinforcement learning (RL) environments. It focuses on combinatorial problems found in industries and challenging decision-making tasks. Jumanji enables rapid iteration and large-scale experimentation for more capable RL agents. It offers high customization, allowing users to tailor initial state distribution and problem complexity. Baseline actor-critic agents are provided for each environment, and it aims to set a new standard for speed, adaptability, and scalability in RL benchmarks.
OCTScenes is a real-world dataset designed for object-centric representation learning. It consists of 5000 tabletop scenes captured from multiple viewpoints, with 15 everyday objects placed on a wooden table. The dataset provides RGB-D images and segmentation ground truth for evaluation. OCTScense serves as a benchmark for evaluating object-centric learning methods in static, dynamic, and multi-view scenes. Experimental results expose the limitations of current methods on real-world data vs. synthetic datasets. The dataset spurs novel algorithm development and advances object-centric representation learning.
Thank you for reading today’s edition.
Your feedback is valuable.
Respond to this email and tell us how you think we could add more value to this newsletter.