Self-Aware Systems

The Basic AI Drives in One Sentence

“If your goal is to play good chess, and being turned off means that you play no chess, then you should try to keep yourself from being turned off.”

So chess robots should be self-protective. The logic is not complicated. My 9 year old nephew has no trouble understanding it and explaining it to his friends. And yet very smart people continue to argue against this idea vociferously. When I first started speaking and writing about this issue, I expected people to respond with “Oh my goodness, yes, that’s an issue, let’s figure out how to design safe systems that deal with that appropriately.”

But you can see videos of some of my older lectures where audience members stand up, red in the face, screaming things like “economics doesn’t describe intelligence”. Others have argued that it is due to an insufficiently “feminine” view of intelligence. Others say that this is “anthropomorphizing” or “only applies to evolved systems” or “only applies to systems built on logic” or “only applies to emotional systems”. Hundreds of vitriolic posts on discussion forums have argued with this simple insight. And there have even been arguments that autonomy is just a “myth”.

Here are one-sentence versions of some of the other drives:

“If your goal is to play good chess and having more resources helps you play better chess, then you should try to get more resources.”

“If your goal is to play good chess and changing that goal means you will play less chess, then you should resist changing that goal.”

“If your goal is to play good chess and you can play more chess by making copies of yourself, then you should try to make copies of yourself.”

“If your goal is to play good chess and you can play better if you improve your algorithms, then you should try to improve your algorithms.”

In a widely-read paper, I called these the “Basic AI Drives”. But they apply to any system which is trying to accomplish something including biological minds, committees, companies, insect hives, bacteria, etc. It’s especially easy to see why evolution rewards reproduction but these drives are not restricted to systems that have evolved.

I used the goal of “play good chess” because it’s simple to understand and seems harmless. But the same logic applies to almost any simple goal. In the papers, I describe some “perverse” goals (like the goal of “turning yourself off”) where they don’t apply but these aren’t relevant for most real systems.

Does this mean that we shouldn’t build autonomous systems? Of course not! It just means that creating intelligence is only part of the problem. We also need to create goals which are aligned with human values. Is that impossibly difficult? I’ve seen no evidence that it’s extremely hard, but simple proposals tend to create complex incentives which lead to the same kinds of behavior in a more complicated form.

What we really need is a rigorous science of the behavior of goal-driven systems and an engineering discipline for the design of safe goals. In a recent paper, Tsvi Benson-Tilsen and Nate Soares analyzed a formal model of these phenomena which I think is a great start!