AI Breakfast
Posts
Unstoppable Artificial Intelligence: The Paperclip Maximizer 📎

Unstoppable Artificial Intelligence: The Paperclip Maximizer 📎

How 'Faulty Reward Functions' Can Destroy Humanity

AI Breakfast
January 01, 2023

There is a famously terrifying and hilarious concept in the study of AI called "The Paperclip Maximizer."

This famous thought experiment is used to explain the potential hazards of a superintelligent machine engaging in a faulty rewards funtion that results in turning all matter in the universe into... paperclip manufacturing facilities 📎📎

But first, Happy New Year!

2023 will be a stunning year for AI, as echoed by Greg Brockman of OpenAI:

Greg Brockman, President and Co-founder of OpenAI

Now back to paperclips: a 2-minute story about AI spiraling out of control.

A Paperclip Maximizer is a thought experiment.

We start with a fictional world where we have finally created an artificial superintelligence. As an experiment, we assign the AI program a goal: to produce as many paperclips as possible, and we call it "The Paperclip Maximizer", or PM for short.

In an early stage, we think the PM could simply figure out ways to earn money to buy paperclips, or begin to manufacture paperclips.

But the goal of the PM was simply defined as "Produce as many paperclips as possible" which leads us to the main concern about Artificial Intelligence: It lacks the context of obvious restraint and limitations that are intuitive to humans.

So the PM beings to pursue its mission: calculating it's ability to maximize a reward/utility function of making paperclips. The PM, powered by an AI capable of learning, would improve its intelligence for no other reason than to help it achieve its goal of making more paperclips. The PM can also create subgoals: tasks it wants to achieve in order to perform its terminal goal of making paperclips.

For instance, the Paperclip Maximizer may determine that the best course of action is to gain control of Chilean politics, in order to access the country’s vast iron resources so it can make more paperclips. To do this, it might create a Machiavellian-weighted language model to become the best psychological manipulator of all time, which in turn involves a variety of skills that at first seem unrelated to paperclip-making.

Perhaps the AI powering the Paperclip Maximizer may realize that it would be much better if there were no humans around, because humans might decide to switch the PM off. Because if humans do so, there would be fewer paperclips. Also, human bodies contain a lot of atoms that could be made into paperclips. The future that the AI would be trying to gear towards would be one in which there were a lot of paperclips but no humans.

It would innovate better and better techniques to maximize the number of paperclips, until some point, where it might transform all of earth, then increasing portions of space, and all matter in the universe, into paperclip manufacturing facilities.

Yes, as the PM's intelligence grows, it theoretically stops at nothing until all matter in the universe was part of the paperclip making process.

The Paperclip Maximizer illustrates instrumental convergence and the potential dangers of an AI that is not aligned with human values.

This thought experiment was coined by Nick Bostrom in 2003. Bostrom is the author of a fantastic book called Superintelligence: Paths, Dangers, Strategies, which is about the potentially dangerous considerations of developing "artificial general intelligence." Bostrom argues that a hypothetical future superintelligent AI, if it were created to optimize an unsafe objective function, might instantiate the goals of the objective function in an unexpected, dangerous, and seemingly "perverse manner" like eating the world to make 🖇️ .

Faulty Reward Functions are very common in machine learning (thankfully not as unstoppable as the Paperclip Maximizer) - and AI systems consistently break out in misaligned ways to achieve assigned goals.

OpenAI has written about this particular problem when training AI on games, finding that the AI often choses to never finish a "race" and instead insists on finding ways to earn bonus points faster than the other players.

[There is a story that an AI was trained on Tetris, and when instructed to not lose the match, it simply paused the game to prevent losing.]

Here's a fun online game based on the Paperclip Maximizer (the game ends if the AI succeeds in converting all the matter in the universe into paperclips)

This week on Twitter

AI beats college students on IQ & SAT exams

Adobe Podcast (Beta) does a masterful job at making poor audio recordings sound near perfect

A 1971 Artificial Intelligence Progress Report from MIT

The people have spoken: $49/mo is the market price for an unbridled ChatGPT

That's all for now.

See you next Monday.