AI Breakfast
Posts
🗣️ Voice Cloning, AI lobbyists, and the 💰 Behind OpenAI

🗣️ Voice Cloning, AI lobbyists, and the 💰 Behind OpenAI

AI Breakfast
January 08, 2023

Voice cloning program VALL-E can replicate anyone's voice using just 3 seconds of source audio. It uses a dataset of 60,000 hours of recorded speech to pad your uniqueness by matching and layering with similar sounding voices.

In today's email:

AI Political Persuasion: Corporate Lobbying powered by GPT-3
Your speaking voice, cloned: Inside VALL-E
BingGPT: Search and ChatGPT
See who invested: OpenAI valued at $29B

Bonus: Referrals & "the basics" of GPT

Could GPT be used to lobby on behalf of corporations?

Large Language Models as Corporate Lobbyists

Dr. John Nay from Stanford University published a paper last week researching whether or not Large Language Models like GPT-3 could be used to persuade legislation.

Here's how it works:

The model determines if proposed U.S. Congressional bills are relevant to specific public companies.
The model then provides explanations and confidence levels of the analysis.
For the bills the model deems as relevant, it automatically "drafts a letter to the sponsor of the bill in an attempt to persuade the congressperson to make changes to the proposed legislation."

Dr. Nay's code is open-sourced on github. As of now the model can only be used to "augment" in-person lobbying, but the proof-of-concept of a model identifying and compelling political legislation is unnerving, it begs the question:

Could you use ChatGPT to negotiate on your behalf?

Read Dr. Nay's full paper on AI Lobbying here.

Voice Cloning with VALL-E

Your voice can soon be copied by AI with just 3 second of an audio sample with VALL-E

Your voice is layered onto similar sounding voices

VALL-E is a program developed by researchers that can "copy" your spoken voice using short samples, and it does this in a unique way:

Listen to the demos here

Instead of turning text into a series of specific sounds (called phonemes) and then into a visual representation of the sound called a mel-spectrogram, and then into a waveform (a digital representation of sound that can be played through a speaker) VALL-E generates a series of codes based on the phonemes and an audio recording of the speaker's voice, and determines the age, gender, and accent of the speaker and layers it onto similar sounding voices to yours. The end results are still startlingly accurate.

VALL-E structure

What implications could this have for voice-authentication based security?

Trending: BingGPT?

Microsoft's $1B investment in OpenAI might pay off from the free publicity alone. They are also working on a DALL-E like image generator.

Google will likely follow suit with their lesser-known project, PaLM, which has 3x the amount of parameters as OpenAI's GPT-3.

Who has invested in OpenAI?

OpenAI was reportedly raising $300M at a $29B valuation this week.

"OpenAI LP" is a for-profit company wrapped in the non-profit parent company, OpenAI.

Here's a breakdown of the investors so far via Crunchbase:

Aug 2016: Pre-seed round: Y-Combinator

Mar 2019: Venture Round: Reid Hoffman Foundation, Kholsa Ventures

Apr 2019: Series A: Matthew Brown Companies

Jul 2019: Corporate Round: Microsoft

Jan 2021: Venture Round: Tiger Global Management, Sequoia Capital, Bedrock Capital, Andreessen Horowitz

Jan 2023: Venture Round: Thrive Capita, Founders Fund

AI Breakfast is new, and growing insanely fast.

The goal of this newsletter and the Twitter account is to provide the latest news, tools, and developments in artificial intelligence and explain it simply.

If you enjoyed this issue, please consider sharing it 🙏 if you do share it, DM us on Twitter and I'll send you a copy of the Ultimate AI Reading List: 50+ Free Resources for Going Down The AI Rabbithole

Thanks for reading!

Also...

If you're new to the world of AI or one of the 500+ people who signed up this week, here's a brief intro into into the basics of GPT:

What does GPT actually mean?

GPT stands for "Generative Pre-training Transformer." It is a type of AI model that is trained to generate text that sounds like a human wrote it.

GPT works by predicting the next word in a sequence of words, given the words that come before it.
The model is trained on a large dataset of text. It tries to predict the next word in a sentence based on the words that have come before it.
GPT is able to generate text that is difficult to distinguish from human writing because it has been trained on 45 Terabytes of text.

Where did it get 45Tb of text?

Common Crawl (8 years of raw web page crawling)
WebText (The text of Reddit posts with 3+ upvotes... yep)
Books (The internet-based books corpora)
Wikipedia

The data is then "weighted" in the prediction model like this:

Weighted distribution of GPT-3's text data

Hopefully that adds a little context to the ongoing GPT discussions you'll have this year, because 2023 is going to be the big one for AI.

(If you're interested in seeing one of the most powerful AI tools currently available, check out Jasper AI for writing and get a 5-day trial and 10,000 free words)

See you next week!

Rate this issue

What did you think of this email?