- AI Breakfast
- Posts
- Advanced Voice Mode is Here!
Advanced Voice Mode is Here!
Good morning. It’s Wednesday, September 25th.
Did you know: On this day in 2007, Halo 3 was released in North America?
In today’s email:
Advanced Voice Mode
Altman’s Superintelligence Blog Post
Meta’s “Imagine Yourself”
Turn Docs into Podcasts
Figma’s AI App Generator
4 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think by replying to this email.
In partnership with PROMPTHERO
Ready to take your AI experiments to the next level?
Master scene lighting, subject positioning, poses and posture, generate realistic hands – and create your own digital photo studio. But let's not stop there. Take all that, and make it into captivating videos. Delve into cutting-edge techniques like ControlNet, Multi-ControlNet, Openpose and Deforum.
Today’s trending AI news stories
OpenAI rolls out Advanced Voice Mode with more voices and a new look
OpenAI has expanded ChatGPT's Advanced Voice Mode to more paying users, rolling it out to those in the Plus and Teams tiers. The update brings a sleeker design, highlighted by a blue animated sphere, and introduces five new voices—Arbor, Maple, Sol, Spruce, and Vale—to elevate the experience.
Note: If you are a ChatGPT Plus user and don’t have access yet, try uninstalling the app and re-installing it.
advanced voice mode rollout starts today! (will be completed over the course of the week)
hope you think it was worth the wait 🥺🫶
— Sam Altman (@sama)
6:21 PM • Sep 24, 2024
Missing from the release, however, are the video and screen-sharing features seen in earlier demos. On the plus side, it now handles accents more smoothly and works seamlessly with ChatGPT’s Custom Instructions and Memory, offering a more tailored experience. Read more.
Sam Altman anticipates Superintelligence soon, defends AI in rare personal blog post
In a rare blog post, OpenAI CEO Sam Altman articulated his vision of an impending “Intelligence Age,” asserting that deep learning's capabilities enable the resolution of complex global challenges, such as climate change and space colonization. He predicts the advent of superintelligence ‘within a few thousand days’, significantly sooner than most experts anticipate.
Altman asserts that AI’s advancements will rely on increased computational power and data availability, paving the way for personal AI teams and virtual tutors for everyone. While acknowledging potential job displacement and resource disparities, he believes the overall impact of AI will yield profound benefits.
Altman’s post, positioned as a personal viewpoint rather than an official OpenAI statement, coincides with the company’s fundraising efforts, aiming for a valuation of $150 billion. He cautions that, without adequate infrastructure, AI could become a resource mainly accessible to the wealthy.
While some predictions, like the potential for virtual tutors, are plausible, many assertions—such as AI creating a utopian future—are met with doubt. Critics argue that the enthusiasm surrounding AI may mask its limitations and the socio-economic upheaval it might cause. Read more.
Meta's new AI creates custom images from a single photo without extra training
Meta has introduced "Imagine Yourself," an AI model capable of generating a variety of personalized images from a single reference photo without requiring additional training. This model can create multiple images of an individual in different poses, styles, and settings by processing the reference image along with accompanying text instructions.
Unlike conventional models that necessitate retraining for each individual, "Imagine Yourself" uses synthetic training pairs to enhance learning, supported by an advanced architecture featuring three parallel text processing modules alongside a trainable image processing module.
While the model demonstrates superior performance in executing complex instructions, it still faces challenges in preserving identity compared to some competing models. Read more.
Open-source PDF2Audio tool turns documents into podcasts and audio summaries
MIT researchers, led by Markus J. Buehler, have launched PDF2Audio, an open-source tool that converts complex documents into podcasts, lectures, and audio summaries. This tool serves as a flexible alternative to Google's "Audio Overviews" feature in NotebookLM, supporting various models, including OpenAI's GPT-4 and other open-source options.
We are excited to share #PDF2Audio, an open-source alternative to the #podcast feature of #NotebookLM with flexibility & tailored outputs that you can precisely control in the app: You can make a podcast, lecture, discussions, short/long form summaries & more, including the use… x.com/i/web/status/1…
— Markus J. Buehler (@ProfBuehlerMIT)
11:49 AM • Sep 23, 2024
Users can upload multiple PDFs, choose prompt templates, and customize audio models and voices, generating content in languages like French, German, and Chinese. PDF2Audio also offers advanced editing features, enabling users to annotate transcripts and adjust tone.
Available on both GitHub and Hugging Face. Read more.
Figma’s AI-powered app generator is back after it was pulled for copying Apple
Figma has relaunched its AI-powered app generator, now called First Draft, after initially withdrawing it due to copyright concerns. The tool is designed to assist designers in creating layouts for apps and websites, addressing feedback from early users who noted similarities to Apple's weather app.
(Re)introducing First Draft, previously known as Make Design. Currently in limited beta →
— Figma (@figma)
5:08 PM • Sep 24, 2024
First Draft is now available in a limited beta, featuring several enhancements. Users can choose from four specialized design libraries, catering to various project requirements, from wireframing tools for low-fidelity designs to high-fidelity libraries for detailed visual exploration. The tool utilizes off-the-shelf AI models, including OpenAI’s GPT-4 and Amazon Titan, to generate designs based on user-defined prompts. Figma insists that First Draft does not train on customer data, ensuring user privacy and the originality of generated designs. Read more.
5 new AI-powered tools from around the web
arXiv is a free online library where researchers share pre-publication papers.
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email!