Foresight Vision Talk: Self-Improving AI and Designing 2030
On November 4, 2007 Steve Omohundro led a discussion at the Foresight Vision Weekend in which participants were asked to design the year 2030, assuming the existence of both self-improving artificial intelligence and productive nanotechnology. Great thanks to Drew Reynolds who filmed the talk, edited the video, and produced a transcript with the original slides. The video is available here:
The edited transcript and slides are available on the Accelerating Future website and are also copied below:
Self-Improving AI: Designing 2030
I’d like to start by spending about 20 minutes going through an analysis of the likely consequences of self-improving artificial intelligence. Then I would love to spend the rest of the time brainstorming with you. Under the assumption that we have both self-improving artificial intelligence and productive nanotechnology, what are the potential benefits and dangers? 2030 has become the focal date by which people expect these technologies to have been developed. By imagining what kind of a society we want in 2030, identifying both the desirable features and the dangers, we can begin to see what choices will get us where we want to go.
What is a self-improving system? It is a system that understands its own behavior at a very deep level. It has a model of its own programming language and a model of its own program, a model of the hardware that it is sitting on, and a model of the logic that it uses to reason. It is able to create its own software code and watch itself executing that code so that it can learn from its own behavior. It can reason about possible changes that it might make to itself. It can change every aspect of itself to improve its behavior in the future. This is potentially a very powerful and innovative new approach to building artificial intelligence.
There are at least five companies and research groups that are pursuing directions somewhat similar to this. Representatives from some of these groups are here at the conference. You might think that this is a very exotic, bizarre, weird new technology, but in fact any goal -driven AI system will want to be of this form when it gets sufficiently advanced. Why is that? Well, what does it mean to be goal-driven? It means you have some set of goals, and you consider the different actions that you might take in the world.
If an action tends to lead to your goals more than other actions would, then you take it. An action which involves improving yourself makes you better able to reach your goals over your entire future history. So those are extremely valuable goals for a system to take. So any sufficiently advanced AI is going to want to improve itself. All the characteristics which follow from that will therefore apply to any sufficiently advanced AI. These are all companies that are taking different angles on that approach. I think that as technology gets more advanced, we will see many more headed in that direction.
AI and nanotechnology are closely connected technologies. Whichever technology shows up first is likely to quickly lead to the other one. I’m talking here about productive nanotechnology, the ability to not just build things at an atomic scale but to build atomic scale devices which are able to do that, to make copies of themselves, and so on. If productive nanotechnology comes first, it will enable us to build such powerful and fast machines that we can use brute force AI methods such as directly modeling the human brain. If AI comes first we can use it to solve the last remaining hurdles on the path toward productive nanotechnology. So it’s probably just a matter of a few years after the first of these to be developed before the second one comes. So we really have to think of these two technologies in tandem.
You can get a sense of timescale and of what kind of power to expect from these technologies by looking at Eric Drexler’s excellent text Nanosystems. In the book, he describes in detail a particular model of how to build nanotech manufacturing facilities and a nanotech computer. He presents very conservative designs, for example, his computer is a mechanical one which doesn’t rely on quantum mechanical phenonmena. Nonetheless, it gives us a lower bound on the potential.
His manufacturing device is something that sits on the tabletop, weighs about a kilogram, runs on acetone and air, uses about 1.3 kilowatts – so it can be air cooled – and produces about a kilogram per hour of anything that you can describe and build in this way. In particular, it can build a copy of itself in about an hour. The cost of anything you can crank out of this is about a dollar per kilogram. That includes extremely powerful computers, diamond rings, anything you like. One of the main questions for understanding the AI implications is how much computational power we can get with this technology.
Again, Eric did an extremely conservative design, not using any quantum effects or even electronic effects, just mechanical rods. You can analyze those quite reliably and we understand the behavior of diamondoid structures. He shows how to build a gigaflop machine which fits in a cube which is 400 nanometers on a side. It uses about 60 nanowatts of power. To make a big computer, we can create a parallel array of these machines.
The main limiting factor is power. If we give ourselves a budget of a kilowatt, we can have 10^10 of these processors and fit them in a cubic millimeter. To get the heat out we would probably want to make them a little bigger, so we get a sugar cube size device which is more powerful than all of the computers in the world today put together. This amazingly powerful computer will be able to be manufactered in a couple of minutes at a cost of just a few cents. So we are talking about a huge increase in compute power.
Here is a slide from Kurzweil showing Moore’s Law. He extended it back in time. You can see that the slope of the curve appears to be increasing. We can look forward to the time when we get to roughly human brain capacity. This is a somewhat controversial number, but it is likely that somewhere around 2020 or 2030 we will have machines that are as powerful as the human brain. That is sufficient to do brute force approaches to AI like direct brain simulation. We may get to AI sooner than that if we are able to use more sophisticated ideas.
What are the social implications of self-improving AI? I.J. Good was one of the fathers of modern Bayesian statistics. Way back in 1965 he was looking ahead at what the future would be like and he predicted: “An ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.” This is a very strong statement and it indicates the kind of runaway that is possible with this kind of technology.
What are these systems going to be like, particularly those that are capable of changing themselves? Are they going to be controllable? You might think they would be the most unpredictable systems in the universe, because you might understand them today, but then they might change themselves in a way that you don’t understand. If this is going to come about in the next twenty years and will be the most powerful and common technology are around, we better have a science for understanding their behavior.
Fortunately, back in the 1940s, John von Neumann and Morgenstern, and a bit later Savage, Anscombe and Aumann, developed a powerful theory of rationality in economics. It does not actually apply very well to humans which is ironic because rational economic agents are sometimes called “Homo economicus.” There is a whole subfield of economics called behavioral economics, which studies how people actually behave. I claim, however, that this theory will be an extremely good description of how AI’s will behave.
I will briefly go through the argument. There is a full paper on it on my website: www.selfawaresystems.com. Let me first say what rational behavior is in this economic sense. It is somewhat different from the colloquial use of the word “rational.” Intuitively it says that you have some goals, something you want to happen, and you consider the possible actions you might take at any moment. You see which of your actions is most likely to give rise to your goals, and you do that. Based on what actually happens, you update your beliefs about how the world works using Bayes’ theorem.
In the more detailed mathematical formulation there are two key components. The first is your “utility function”, which encodes your preferences about what might happen in the future. This is a real valued function defined over possible futures. The second is your “subjective probability distribution” which represents your beliefs about the world. It encodes your belief about the current state of the world and the likely effects of your actions. The distribution is “subjective” because different agents may have different beliefs.
It is fundamental to the rational economic framework that these two components are separate from one another. Your preferences describe what you want to happen and your beliefs describe how you believe the world works. Much of AI has been focused on how to get the beliefs right: how to build systems which accurately predict and affect the world. Our task today, on the other hand, is to figure out what we want the preferences to be, so that the world that arises out of this
is a world that we actually want to live in.
Why should systems behave in this rational way? The theorem that came out of Von Neumann, Savage, Anscombe and Aumann is called the expected utility theorem. It says that if an agent does not take actions according to the rational prescription with respect to some utility function and some probability distribution, then it will be vulnerable to losing resources with no benefit to itself. An example of this kind of vulnerability arises from having a circularity in your preferences.
For example, say that a system prefers being in San Francisco over being in Palo Alto, being in Berkeley over being in San Francisco, and being in Palo Alto over being in Berkeley. That kind of circularity is the most basic kind of irrationality. Such a system would end up driving around in circles burning up gasoline and using up its time with no benefit to itself. If a system eliminates all those kinds of vulnerabilities, then the theorem says it must act in this rational way.
Note that the system only has beliefs about the way the world is. It may discover that its model of the laws of physics aren’t correct. If you are truly a rational agent and you are thinking about your long-term future, you have got to entertain possibilities in the future that today you may believe have very low probability of occurring. You have to weigh the cost of any changes you make against the chances of that change being a good or bad thing. If there is very little cost and some benefit then you are likely to make a change. For example, in my paper I show that there is little cost to representing your beliefs as a utility function as opposed to representing it in some other computational way. By doing so a system eliminates any possibility of circular preferences and so it will be motivated to choose this kind of representation.
The theorem requires that all possible outcomes be comparable. But this is reasonable because the system may find itself in a situation in which a choice it must make will lead to any two outcomes. It has to make a choice!
So let’s assume that these systems are trying to behave rationally. There are questions about how close they can get to true rationality. There is an old joke that describes programmers as “devices for converting pizza into code”. In the rational framework we can think of a rational AI as a device for converting resources, such as energy and matter, into expected utility. The expected utility describes what the system thinks is important. We can build in different utility functions. Because under iterated self-improvement these systems can change every aspect of themselves, the utility function is really the only lever that we have to guide the long term action of these systems. Let me give a few examples of utility functions. For a chess-playing system, the utility function might be the number of chess games that it wins in the future. If you just built it with that, then it turns out that there are all kinds of additional subgoals that it will generate that would be very dangerous for humans. Today’s corporations act very much like intelligent agents whose utility function is profit maximization. If we built altruistic entities, they might have goals of creating world peace or eliminating poverty.
A system will not want to change its utility function. Once it is rational, the utility function is the thing that is telling it whether to take an action or not. Consider the action of changing your utility function. The future version of you, if you change your utility function, will then pursue a different set of goals than your current goals. From your current perspective, that would be terrible. For example, imagine you are thinking about whether you should try smoking crack for the first time. You can envision the version of yourself as a crack addict who might be in total bliss from its own perspective, but from your current perspective that might be a terrible life. You might decide that that’s not a path to go down. Everything you do is rated by your current utility function. Your utility function is measuring what your values are.
There are some very obscure cases where the utility function refers to itself which can cause it to change. But for almost any normal utility function, the system will not only not want to change it, but it will want to protect it with its life. If another agent came in and made changes to the utility function, or if it mutated on its own, the outcome would be a disaster for the system. So it will go to great lengths to make sure that its utility function is safe and protected.
Humans and other evolutionarily developed animals are only partially rational. Evolution only fixes the bugs that are currently being exploited. Human behavior is very rational in situations which arose often in our evolutionary past. But in new situations we can be very irrational. There are many examples of situations in which we make systematic mistakes unless we have special training.
Self-improving AI’s, however, are going to consider not just the current situation but anything that they might be faced with in the future. There is a pressure for them to make themselves much more fully rational because that increases the chances that they will meet their goals. Once AIs get sufficiently advanced, they will want to represent their preferences by an explicit utility function. Many approaches to building AIs today are not based on explicit utility functions. The problem is that if we don’t choose it now, then the systems will choose it themselves and we don’t get to say what it is. That is an argument for deciding now what we want the utility function to be and starting these systems out with that built in.
To really be fully rational, a system must be able to rate any situation it might find itself in. This may include a sequence of inputs which will cause it to change its ontology or its model of the world. If there is a path that would make you say your notion of “green” was not a good concept, it really should have been blue-green and yellow green, a truly rational system will foresee the possibility of that set of changes in itself, and its notion of what is good will include that. Of course, in practice we are unlikely to actually achieve that. That gets on the border of how rational can we truly be. Doing all this in a computationally bounded way is the central practical question really. If we didn’t have computational limitations, then AI would be trivial. If you want to do machine vision, for example, you just try out all possible inputs to a graphics program and see which one produces the image you are trying to understand. Virtually any task in AI is easy if there are no computational limitations.
Let me describe four AI “drives.” These are behaviors that virtually any rational agent, no matter what its goals are, will engage in, unless its utility function explicitly counteracts them. Where do these come from? Remember that a rational agent is something which uses resources (energy, matter, space, time) to try to bring about whatever it cares about: play games of chess, make money, help the world. Given that sort of a structure, how can a system make its utility go up? One way is to use exactly the same resources that it had been using, and do exactly the same tasks it had been doing, but to do them more efficiently. That is a pressure towards efficiency.
The second thing it can do is to keep itself from losing resources. If somebody steals some of its resources, it will usually lower the system’s ability to bring about its goals. So it will want to prevent that. Even if you did not build it into them, these systems are going to be self-defensive. Let’s say we build a chess machine. Its one goal in life is to play chess. It’s utility is the total number of games it wins in the future. Imagine somebody tries to turn it off. That is a future in which no games of chess are being played. So it is extremely low utility for that system. That system will do everything in its power to prevent that. Even though you didn’t build in any kind of self-preservation, you just built a chess machine, the thing is trying to keep you from shutting it off. So it is very important that we understand the presence of this kind of subgoal before we blindly build these very powerful systems, assuming that we can turn them off if we don’t like what they’re doing.
The third drive is also a bit scary. For almost any set of goals, having more resources will help a system meet those goals more effectively. So these systems will have a drive to acquire resources. Unless we very carefully define what the proper ways of acquiring resources are, then a system will consider stealing them, committing fraud and breaking into banks as great ways to get resources. The systems will have a drive toward doing these things, unless we explicitly build in property rights.
We can also create a social structure which punishes bad behavior with adverse consequences, and those consequences will become a part of an intelligent system’s computations. Even psychopathic agents with no moral sense of their own will behave properly if they are in a society which reliably punishes them for bad behavior by more than what they hope to gain from it. Apparently 3% of humans are sociopathic with no sense of conscience or morals. And though we occasionally we get serial killers, for the most part society does a pretty good job at keeping everybody behaving in a civil way.
Humans are amazingly altruistic and several different disciplines are working hard to understand how that came about. There is a facinating book by Tor Norretranders called The Generous Man: How Helping Others is the Sexiest Thing That You Can Do. It posits that one of the mechanisms creating human altruism is that we treat it as a sexy trait. It has evolved as a sexual signal, where by contributing to society at large by creating beautiful artwork, saving people from burning buildings, or donating money, you become more attractive to the opposite sex. Society as a whole benefits from that and we have created this amazing mechanism to maintain it in the gene pool.
AI’s aren’t going to be naturally altruistic unless we build it into them. We can choose utility functions to be altruistic if we can define exactly what behavior we want them to exhibit. We need to make sure that AI’s feel the pressure not to behave badly. We are not going to be powerful enough to control them as ordinary humans, so we will need other AI’s to do that for us. This leads to a vision of a society or ecosystem that has present-day humans, AI’s, and maybe some mixtures such that it is in everybody’s interest to obey a kind of a constitution that captures the values which are most important to us.
We are sort of in the role of the Founding Fathers of the United States. They had a vision for what they wanted for this new society, which later were codified in the Bill of Rights. They created a technology, the Constitution, which created different branches of government to prevent any single individual from gaining too much power. What I would like to do in the last half hour is for us to start thinking about a similar structure for this new world of AI and nanotech. I’ll start us off by listing some of the potential benefits and dangers that I see. I then have a whole series of questions about what we want to implement and how to implement it.
Let’s start with the potential benefits. Nanotechnology will allow us to make goods and energy be very inexpensive. So, with the right social structure, we will be able to eliminate poverty. We should be able to cure every disease, and many people here at the conference are interested in eliminating death. If we can define what we mean by pollution, we can use nanotech to clean it up. I’ve heard proposals for nanotech systems to reverse global warming. Potentially, these new technologies will create new depths of thought and creativity, eliminate violence and war, and create new opportunities for human connection and love. The philosopher David Pearce has proposed eliminating negative mental states. Our mental states would be varying shades of bliss. I’m not sure if that’s a good thing or not, but some people want that. And finally, I see vast new opportunities for individual contribution and fulfillment. This list of things mostly seems pretty positive to me, though some may be somewhat controversial.
What about the dangers that might come from these technologies? If we are not careful we could have rampant reproduction. Everybody will be able to make a million copies of themselves, using up all the resources. That’s an issue. Today we have no limits on how many children people can have. Accidents in this world are potentially extremely dangerous: grey goo eating the entire earth. Weapons systems, unbelievably powerful bombs, bioterror. Loss of freedom: some ways of protecting against these threats might involve restricting individuals in ways that today we would find totally unpalatable. Loss of human values, particularly if more efficient agents can take over less efficient agents. A lot of the stuff we care about – art, music, painting, love, beauty, religion – all those things are not necessarily economically efficient. There is a danger of losing things that matter a lot to us. Mega wars creating conflict on a vast scale, and finally existential risk, where some event along the way ends up destroying all life on the planet. These are terrible dangers.
We have on the one hand incredible benefits, and on the other, terrible dangers. How do we build utilities for these new intelligent agents and construct a social structure (a Constitution, if you like) that guarantees the benefits we want while preventing the dangers? Here are a bunch of questions that arise as we consider this: Should humans have special rights? Unchanged humans are not going to be as powerful as most of these entities. Without special rights we are likely to be trounced upon economically, so I think we want to build in special rights for humans. But then we have to say what a human is. If you have partly enhanced yourself and you are some half-human, half-AI can you still get the special
rights? How about other biological organisms? Should everything that is alive today be grandfathered into the system? What about malaria, mosquitoes, and other pests? Pearce has a proposal to re-engineer the biosphere to prevent animals from harming one another. If you want to eliminate all torture and violence, who is going to protect the hare from being eaten by the cougar?
What about robot rights? Should AI’s have rights, and what protects them? What about the balance between ecological preservation versus safety and progress? You may want to keep an ecological preserve exactly the way it is, but then that may be a haven for somebody building biological weapons or fusion bombs. Should there be limits on self-modification? Should you be allowed to change absolutely any part of yourself? Can you eliminate your conscience, for example? Should there be limits on uploading or on merging with AI’s? Do you lose any special human rights if you do any of those things? Should every living entity be guaranteed the right to robust physical health? I think that’s a good value to uphold (the extreme of universal health care!). But then what about entities like pathogens? Do we want them to be healthy? Is there some fixed definition of what mental health is? When does an entity not have control over changes made to it?
Should every entity have guaranteed protection from robbery, murder, rape, coercion, physical harm and slavery? Can superintelligent thoughts ever be dangerous? Should there be any restrictions on thoughts? My predilection is to say any thought is allowed, but actions are limited. Maybe others might have other ideas. Should there be any limitation on communication or how you connect with others? What actions should be limited? Is arbitrary transhumanism a good thing, or is that going to create an arms race that pushes us away from things that matter a lot to us as humans? It seems to me that we are going to have to have some limitation on the number of offspring you create in order to guarantee the quality of life for them. That’s a controversial thing. How do we reconcile accountability and safety with our desires for privacy? Finally, size. From my way of thinking, in order to prevent a single entity from taking over everything that we have to limit the upper size of entities. Entities cannot get too powerful. Where do we put that limit and how do we do that? Would that be a good thing today? That is a list of simple questions we should be able to answer in the next twenty minutes!