Advanced Voice Mode is Here!

Good morning. It’s Wednesday, September 25th.

Did you know: On this day in 2007, Halo 3 was released in North America?

In today’s email:

  • Advanced Voice Mode

  • Altman’s Superintelligence Blog Post

  • Meta’s “Imagine Yourself”

  • Turn Docs into Podcasts

  • Figma’s AI App Generator

  • 4 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

In partnership with PROMPTHERO

Ready to take your AI experiments to the next level?

Master scene lighting, subject positioning, poses and posture, generate realistic hands – and create your own digital photo studio. But let's not stop there. Take all that, and make it into captivating videos. Delve into cutting-edge techniques like ControlNet, Multi-ControlNet, Openpose and Deforum.

Today’s trending AI news stories

OpenAI rolls out Advanced Voice Mode with more voices and a new look

OpenAI has expanded ChatGPT's Advanced Voice Mode to more paying users, rolling it out to those in the Plus and Teams tiers. The update brings a sleeker design, highlighted by a blue animated sphere, and introduces five new voices—Arbor, Maple, Sol, Spruce, and Vale—to elevate the experience.

Note: If you are a ChatGPT Plus user and don’t have access yet, try uninstalling the app and re-installing it.

Missing from the release, however, are the video and screen-sharing features seen in earlier demos. On the plus side, it now handles accents more smoothly and works seamlessly with ChatGPT’s Custom Instructions and Memory, offering a more tailored experience. Read more.

Sam Altman anticipates Superintelligence soon, defends AI in rare personal blog post

In a rare blog post, OpenAI CEO Sam Altman articulated his vision of an impending “Intelligence Age,” asserting that deep learning's capabilities enable the resolution of complex global challenges, such as climate change and space colonization. He predicts the advent of superintelligence ‘within a few thousand days’, significantly sooner than most experts anticipate.

Altman asserts that AI’s advancements will rely on increased computational power and data availability, paving the way for personal AI teams and virtual tutors for everyone. While acknowledging potential job displacement and resource disparities, he believes the overall impact of AI will yield profound benefits.

Altman’s post, positioned as a personal viewpoint rather than an official OpenAI statement, coincides with the company’s fundraising efforts, aiming for a valuation of $150 billion. He cautions that, without adequate infrastructure, AI could become a resource mainly accessible to the wealthy.

While some predictions, like the potential for virtual tutors, are plausible, many assertions—such as AI creating a utopian future—are met with doubt. Critics argue that the enthusiasm surrounding AI may mask its limitations and the socio-economic upheaval it might cause. Read more.

Meta's new AI creates custom images from a single photo without extra training

Meta has introduced "Imagine Yourself," an AI model capable of generating a variety of personalized images from a single reference photo without requiring additional training. This model can create multiple images of an individual in different poses, styles, and settings by processing the reference image along with accompanying text instructions.

Unlike conventional models that necessitate retraining for each individual, "Imagine Yourself" uses synthetic training pairs to enhance learning, supported by an advanced architecture featuring three parallel text processing modules alongside a trainable image processing module.

While the model demonstrates superior performance in executing complex instructions, it still faces challenges in preserving identity compared to some competing models. Read more.

Open-source PDF2Audio tool turns documents into podcasts and audio summaries  

MIT researchers, led by Markus J. Buehler, have launched PDF2Audio, an open-source tool that converts complex documents into podcasts, lectures, and audio summaries. This tool serves as a flexible alternative to Google's "Audio Overviews" feature in NotebookLM, supporting various models, including OpenAI's GPT-4 and other open-source options.

Users can upload multiple PDFs, choose prompt templates, and customize audio models and voices, generating content in languages like French, German, and Chinese. PDF2Audio also offers advanced editing features, enabling users to annotate transcripts and adjust tone.

Available on both GitHub and Hugging Face. Read more.

Figma’s AI-powered app generator is back after it was pulled for copying Apple

Figma has relaunched its AI-powered app generator, now called First Draft, after initially withdrawing it due to copyright concerns. The tool is designed to assist designers in creating layouts for apps and websites, addressing feedback from early users who noted similarities to Apple's weather app.

First Draft is now available in a limited beta, featuring several enhancements. Users can choose from four specialized design libraries, catering to various project requirements, from wireframing tools for low-fidelity designs to high-fidelity libraries for detailed visual exploration. The tool utilizes off-the-shelf AI models, including OpenAI’s GPT-4 and Amazon Titan, to generate designs based on user-defined prompts. Figma insists that First Draft does not train on customer data, ensuring user privacy and the originality of generated designs. Read more.

5 new AI-powered tools from around the web

arXiv is a free online library where researchers share pre-publication papers.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, reply to this email!