Skip to content

November 1, 2007

Stanford Computer Systems Colloquium Talk: Self-Improving AI and the Future of Computing

by omohundro

On October 24, 2007 Steve Omohundro gave the Stanford EE380 Computer Systems Colloquium on “Self-Improving Artificial Intelligence and the Future of Computing”. Great thanks to Drew Reynolds who filmed the talk, edited the video, and produced a transcript with the original slides. The video is available here:

The transcript and slides are available on the Accelerating Future website and are also copied below:


Self-Improving AI and The Future of Computation


We’re going to cover a lot of territory today and it may generate some controversy. I’m happy to take short questions while we’re going through it, but let’s hold the more controversial ones until the end.


Let’s start by looking at the state of today’s computer software. On June 4th, 1996, an Ariane 5 rocket worth $500 million blew up 40 seconds after takeoff. It was later determined that this was caused by an overflow error in the flight control software as it tried to convert a 64-bit floating point value into a 16-bit signed-register.

In November 2000, 28 patients were over-irradiated in the Panama City National Cancer Institute. 8 of these patients died as a direct result of the excessive radiation. An error in the software which computes the proper radiation dose was responsible for this tragedy.


On August 14, 2003, the largest blackout in U.S. history shut off power for 50 million people in the Northeast and in Canada and caused financial losses of over $6 billion. The cause turned out to be a race condition in the General Electric software that was monitoring the systems.

Microsoft Office is used on 94% of all business computers in the world and is the basis for many important financial computations. Last month it was revealed that Microsoft Excel 2007 gives the wrong answer when multiplying certain values together.

As of today, the Storm Worm trojan is exploiting a wide range of security holes and is sweeping over the internet and creating a vast botnet for spam and denial of service attacks. There is some controversy about exactly how many machines are currently infected, but it appears to be between 1 and 100 million machines. Some people believe that the Storm Worm Botnet may now be the largest supercomputer in the world.

We had a speaker last quarter who said that two out of three personal computers are infected by malware.

Wow! Amazing! Because of the scope of this thing, many researchers are studying it. In order to do this, you have to probe the infected machines and see what’s going on. As of this morning, it was announced that apparently the storm worm is starting to attack back! When it detects somebody trying to probe it, it launches a denial of service attack on that person and knocks their machine off the internet for a few days.

If mechanical engineering were in the same state as software engineering, nobody would drive over bridges. So why is software in such a sorry state? One reason is that software is getting really, really large. The NASA space shuttle flight control software is about 1.8 million lines of code. Sun Solaris is 8 million lines of code. Open Office is 10 million lines of code. Microsoft Office 2007 is 30 million lines of code. Windows Vista is 50 million lines of code. Linux Debian 3.1 is 215 million lines of code if you include everything.


But programmers are still pretty darn slow. Perhaps the best estimation tool available is Cocomo II. They did empirical fits to a whole bunch of software development projects and they came up with a simple formula to estimate the number of person months required to do a project. It has a few little fudge factors for how complex the project is and how skilled the programmers are. Their website has a nice tool where you can plug in the parameters of your project and see the projections. For example, if you want to develop a 1million line piece of software today, it will take you about 5600 person months. They recommend using 142 people working for three years at a cost of $89 million. If you divide that out you discover that average programmer productivity for producing working code is about 9 lines a day!


Why are we so bad at producing software? Here are a few reasons I’ve noticed in my experience. First, people aren’t very good at considering all the possible execution paths in a piece of code, especially in parallel or distributed code. I was involved in developing a parallel programming language called pSather. As a part of its runtime, there was a very simple snippet of about 30 lines of code that fifteen brilliant researchers and graduate students had examined over a period of about six months. Only after that time did someone discover a race condition in it. A very obscure sequence of events could lead to a failure that nobody had noticed in all that time. That was the point at which I became convinced that we don’t want people determining when code is correct.

Next, it’s hard to get large groups of programmers to work coherently together. There’s a classic book The Mythical Man Month that argues that adding more programmers to a project often actually makes it last longer.

Next, when programming with today’s technology you often have to make choices too early. You have to decide on representing a certain data structure as a linked list or as an array long before you know enough about the runtime environment to know which is the right choice. Similarly, the requirements for software are typically not fixed, static documents. They are changing all the time. One of the characteristics of software is that very tiny changes in the requirements can lead to the need for a complete reengineering of the implementation. All these features make software a really bad match with what people are good at.


The conclusion I draw is that software should not be written by people! Especially not parallel or distributed software! Especially not security software! And extra especially not safety-critical software! So, what can we do instead?


The terms “software synthesis” and “automatic programming” have been used for systems which generate their own code. What ingredients are needed to make the software synthesis problem well-defined? First, we need a precisely-specified problem. Next, we need the probability distribution of instances that the system will be asked to solve. And finally, we need to know the hardware architecture that the system will run on. A good software synthesis system should take those as inputs and should produce provably correct code for the specified problem running on the specified hardware so that the expected runtime is as short as possible.


There are a few components in this. First, we need to formally specify what the task is. We also need to formally specify the behavior of the hardware we want to run on. How do we do that? There are a whole bunch of specification languages. I’ve listed a few of them here. There are differences of opinion about the best way to specify things. The languages generally fall into three groups corresponding to the three approaches to providing logical foundations for mathematics: set theory, category theory, and type theory. But ultimately first-order predicate calculus can model all of these languages efficiently. In fact, any logical system which has quickly checkable proofs can be modeled efficiently in first-order predicate calculus, so you can view that as a sufficient foundation.


The harder part, the part that brings in artificial intelligence, is that many of the decisions that need to made in synthesizing software have to be made in the face of partial knowledge. That is, the system doesn’t know everything that is coming up and yet has to make choices. It has to choose which algorithms to run without necessarily knowing the performance of those algorithms on the particular data sets that they are going to be run on. It has to choose what data structures to model the data with. It has to choose how to assign tasks to processors in the hardware. It has to decide how to assign data to storage elements in the hardware. It has to figure out how much optimization to do and where to focus that optimization. Should it compile the whole thing at optimization -05? Or should it highly optimize only the parts that are more important? How much time should it spend actually executing code versus planning which code to execute? Finally, how should it learn from watching previous executions?


The basic theoretical foundation for making decisions in the face of partial information was developed back in 1944 by von Neumann and Morgenstern. Von Neumann and Mergenstern dealt with situations in which there are objective probabilities. In 1954, Savage and in 1963, Anscombe and Aumann extended that theory to dealing with subjective probabilities. It has become the basis for modern microeconomics. The model of a rational decision-maker that the theory gives rise to is sometimes called “Homo economicus.” This is ironic because human decision-making isn’t well described by this model. There is a whole branch of modern economics devoted to studying what humans actually do called behavioral economics. But we will see that systems which self-improve will try to become as close as possible to being rational agents because that is how they become the most efficient.


What is rational economic behavior? There are several ingredients. First, a rational economic agent represents its preferences for the future, by a real valued utility function U. This is defined over the possible futures, and it ranks them according to which the system most prefers. Next, a rational agent must have beliefs about what the current state of the world is and what the likely effects of its actions are. These beliefs are encoded in a subjective probability distribution P. The distribution is subjective because different agents may have a different view of what the truth is about the world. How does such an agent make a decision? It first determines the possible actions it can take. For each action, it considers the likely consequences of that action using its beliefs. Then it computes the expected utility for each of the actions it might take and it chooses the action that maximizes its expected utility. Once it acts, it observes what actually happens. It should then update its beliefs using Bayes’ theorem.


In the abstract, it’s a very simple prescription. In practice, it is quite challenging to implement. Much of what artificial intelligence deals with is implementing that prescription efficiently. Why should an agent behave that way? The basic content of the expected utility theorem of von Neumann, Anscombe and Aumann is that if an agent does not behave as if it maximizes expected utility with respect to some utility function and some subjective probability distribution, then it is vulnerable to resource loss with no benefit. This holds both in situations with objective uncertainties, such as roulette wheels, where you know the probabilities, and in situations with subjective uncertainties, like horse races. In a horse race, different people may have different assessments of probabilities for each horse winning. It is an amazing result that comes out of economics that says a certain form of reasoning is necessary in order to be an effective agent in the world.


How does this apply to software? Let’s start by just considering a simple task. We have an algorithm that computes something, such as sorting a list of numbers, factoring a polynomial, or proving theorems. Pick any computational task that you’d like. In general there is a trade-off between space and time. Here, let’s just consider the trade-off between the size of the program and the average execution time of that program on a particular distribution of problem instances. In economics this curve defines what is called the production set. All these areas above the curve are computational possibilities, whereas those below the curve are impossible. The curve defines the border between what is possible and what is impossible. The program which is the most straightforward implementation of the task lies somewhere in the middle. It has a certain size and a certain average execution time. By doing some clever tricks, say by using complex data compression in the program itself, we can shrink it down a little, but then uncompressing at runtime will make it a little bit slower on average. If we use really clever tricks, we can get down to the smallest possible program, but that costs more time to execute.

Going in the other direction, which is typically of greater interest, because space is pretty cheap, we give the program more space in return for getting a faster execution time. We can do things like loop unrolling, which avoids the some of the loop overhead at the expense of having a larger program. In general, we can unfold some of the multiple execution paths, and optimize them separately, because then we have more knowledge of the form of the actual data along each path. There are all sorts of clever tricks like this that compilers are starting to use. As we get further out along the curve, we can start embedding the answers to certain inputs directly in the program. If there are certain inputs that recur quite a bit, say during recursions, then rather than recomputing them each time, it’s much better to just have those answers stored. You can do that at runtime with the technique of memoization, or you can do it at compile time and actually store the answers in the program text. The extreme of this is to take the entire function that you are trying to compute and just make it into a big lookup table. So program execution just becomes looking up the answer in the table. That requires huge amounts of space but very low amounts of time.

What does this kind of curve look like in general? For one thing, having more program size never hurts, so it’s going to be a decreasing (or more accurately non-increasing) curve. Generally the benefit we get by giving a program more space decreases as it gets larger, so it will have a convex shape. This type of relationship between the quantities we care about and the resources that we consume, is very common.


Now let’s say that now we want to execute two programs as quickly as possible. We can take the utility function to be the negative of total execution time. We’d like to maximize that while allocating a certain amount of fixed space S between these two programs. How should we do that? We want to maximize the utility function subject to the constraint that the total space is S. If we take the derivative with respect to the space we allocate to the first program and set that to zero, we find that at optimal space allocation the two programs will have equal marginal speedup. If we give them a little bit more space, they each get faster at the same rate. If one improved more quickly, it would be better to give it more space at the expense of the other one. So a rational agent will allocate the space to make these two marginal speedups equal. If you’ve ever studied thermodynamics you’ve seen similar diagrams where this is a piston between two gases. In thermodynamics, this kind of argument shows that the pressure will become equilibrated between the chambers. It’s a very analogous kind of a thing here.


That same argument applies in much greater generality. In fact it applies to any resource that we can allocate between subsystems. We have been looking at program size, but you can also consider how much space the program has available while it is executing. Or how to distribute compilation time to each component. Or how much time should be devoted to compressing each piece of data. Or how much learning time should be devoted to each learning task. Or how much space should be allocated for each learned model. Or how much meta-data about the characteristics of programs should be stored. Or how much time should you spend proving different theorems. Or which theorems are worthy of storing and how much effort should go into trying to prove them. Or what accuracy should each computation be performed at. The same kind of optimization argument applies to all of these things and shows that at the optimum the marginal increase of the expected utility as a result of changing any of these quantities for every module in the system should be the same. So we get a very general “Resource Balance Principle”.


While that sounds really nice in theory, how do we actually build software systems that do all this? The key insight here is that meta-decisions, decisions about your program, are themselves economic decisions. They are choices that you have to make in the face of uncertain data. So a system needs to allocate its resources between actually executing its code and doing meta-execution: thinking about how it should best execute and learning for the future.
You might think that there could be an infinite regress here. If you think about what you are going to do, and then think about thinking about what you are going to do, and then think about thinking about thinking about what you are going to do… but, in fact, it bottoms out. At some point, actually taking an action has higher expected utility than thinking about taking that action. It comes straight out of the underlying economic model that tells you how much thinking about thinking is actually worthwhile.

Remember I said that in the software synthesis task, the system has to know what the distribution of input instances are. Generally, that’s not something that is going to be handed to it. It will just be given instances. But that’s a nice situation in which you can use machine learning to estimate the distribution of problem instances. Similarly, if you are handed a machine, you probably need to know the semantics of the machine’s operation. You need to know what the meaning of a particular machine code is, but you don’t necessarily have to have a precise model of the performance of that machine. That’s another thing that you can estimate using machine learning: How well does your cache work on average when you do certain kinds of memory accesses? Similarly, you can use machine learning to estimate expected algorithm performance.


So now we have all the ingredients. We can use them to build what I call “self-improving systems.” These are systems which have formal models of themselves. They have models of their own program, the programming language they’re using, the formal logic they use to reason in, and the behavior of the underlying hardware. They are able to generate and execute code to solve a particular class of problems. They can watch their own execution and learn from that. They can reason about potential changes that they might make to themselves. And finally they can change every aspect of themselves to improve their performance. Those are the ingredients of what I am calling a self-improving system.


You might think that this is a lot of stuff to do, and in fact it is quite a complex task. No systems of this kind exist yet. But there are at least five groups that I know of who are working on building systems of this ilk. Each of us has differing ideas about how to implement the various pieces.

There is a very nice theoretical result from 2002 by Marcus Hutter that gives us an intellectual framework to think about this process. His result isn’t directly practical, but it is interesting and quite simple. What he showed is that there exists an algorithm which is asymptotically within a factor of five of the fastest algorithm for solving any well-defined problem. In other words, he has got this little piece of code in theory and you give me the very best algorithm for solving any task you like, and his little piece of code if you have a big enough instance asymptotically will run within a factor of five of your best code. It sounds like magic. How could it possibly work? The way it works is that the program interleaves the execution of the current best approach to solving the problem with another part that searches for a proof that something else is a better approach. It does the interleaving in a clever way so that almost all of the execution time is spent executing the best program. He also shows that this program is one of the shortest programs for solving that problem.


That gives us the new framework for software. What about hardware? Are there any differences? If we allow our systems to not just try and program existing hardware machines but rather to choose the characteristics of the machines they are going to run on, what does that look like? We can consider the task of hardware synthesis in which, again, we are given a formally specified problem. We are also again given a probability distribution over instances of that problem that we would like it to solve, and we are given an allowed technology. This might be a very high level technology, like building a network out of Dell PCs to try and solve this problem, or it might go all the way down to the very finest level of atomic design. The job of a hardware synthesis system is to output a hardware design together with optimized software to solve the specified problem.

When you said “going down to a lower level” like from Dell PCs, did you mean to the chip level?

Yes, you could design chips, graphics processors, or even, ultimately, go all the way down to the atomic level. All of those are just differing instances of the same abstract task.

Using the very same arguments about optimal economic decision-making and the process of self-improvement, we can talk about self-improving hardware. The very general resource balance principle says that when choosing which resources to allocate to each subsystem, we want the marginal expected utility for each subsystem to be equal. This principle applies to choosing the type and number of processors, how powerful they should be, whether they should have specialized instruction sets or not, and the type and amount of memory. There are likely to be memory hierarchies all over the place and the system must decide how much memory to put at each level of each memory subsystem. The principle also applies to choosing the topology and bandwidth of the network and the distribution of power and the removal of heat.


The same principle also applies to the design of biological systems. How large should you make your heart versus your lungs? If you increase the size of the lungs it should give rise to the same marginal gain in expected utility as increasing the size of the heart. If it were greater, then you could improve the overall performance by making the lungs larger and the heart smaller. So this gives us a rational framework for understanding the choices that are made in biological systems. The same principle applies to the structure of corporations. How should they allocate their resources? It also applies to cities, ecosystems, mechanical devices, natural language, and mathematics. For example, a central question in linguistics is understanding which concepts deserve their own words in the lexicon and how long those words should be. Recent studies of natural language change show the pressure for common concepts to be represented by shorter and shorter phrases which eventually become words and for words representing less common concepts to drop out of use. The principle also gives a rational framework for deciding which mathematical theorems deserve to be proven and remembered. The rational framework is a very general approach that applies to systems all the way from top to bottom.

We can do hardware synthesis for choosing components in today’s hardware, deciding how many memory cards to plug in and how many machines to put on a network. But what if we allow it to go all the way, and we give these systems the power to design hardware all the way down to the atomic scale? What kind of machines will we get? What is the ultimate hardware? Many people who have looked at this kind of question conclude that the main limiting resource is power. This is already important today where the chip-makers are competing over ways to lower the power that their microprocessors use. So one of the core questions is how do we do physical and computational operations while using as little power as possible? It was thought in the ’60s that there was a fundamental lower limit to how much power was required to do a computational operation, but then in the ’70s people realized that no, it’s really not computation that requires power, it’s only the act of erasing bits. That’s really the thing that requires power.


Landauer’s Principle says that erasing a bit generates kT ln 2 of heat. For low power consumption, you can take whatever computation you want to do and embed it in a reversible computation – a reversible computation is one where the answer has enough information in it to go backwards and recompute the inputs – then you can run the thing forward, copy the answer into some output registers, which is the entropically costly part, and then run the computation backwards and get all the rest of the entropy back. That’s a very low entropy way of doing computation and people are starting to use these principles in designing energy efficient hardware.

You might have thought, that’s great for computation, but surely we can’t do that in constructing or taking apart physical objects! And it’s true, if you build things out of today’s ordinary solids then there are lower limits to how much entropy it takes to tear them apart and put them together. But, if we look forward to nanotechnology, which will allow us to building objects with atomic precision, the system will know precisely what atoms are there, where they are, and which bonds are between them. In that setting when we form a bond or break it, we know exactly what potential well to expect. If we do it slowly enough and in such a way as to prevent a state in a local energy minimum from quickly spilling into a deeper minimum, then as a bond is forming we can extract that energy in a controlled way and store it, sort of like regenerative braking in a car. In principle, there is no lower limit to how little heat is required to build or take apart things, as long as we have atomically precise models of them. Finally, of course, there is a lot of current interest in quantum computing. Here’s an artist’s rendering of Schrödinger’s cat in a computer.


Here is a detailed molecular model of this kind of construction that Eric Drexler has on his website. Here we see the deposition of a hydrogen atom from a tooltip onto a workpiece. Here we remove a hydrogen atom and here we deposit a carbon atom. These processes have been studied in quantum mechanical detail and can be made very reliable. Here is a molecular Stewart platform that has a six degree of freedom tip that can be manipulated with atomic precision. Here is a model of a mill that very rapidly attaches atoms to a growing workpiece. Here are some examples of atomically precise devices that have been simulated using molecular energy models. Pretty much any large-scale mechanical thing – wheels, axles, conveyor belts, differentials, universal joints, gears – all of these work as well, if not better, on the atomic scale as they do on the human scale. They don’t require any exotic quantum mechanics and so they can be accurately modeled with today’s software very efficiently.


Eric has a fantastic book in which he does very conservative designs of what will be possible. There are two especially important designs that he discusses, a manufacturing system and a computer. The manufacturing system weighs about a kilogram and uses acetone and air as fuel. It requires about 1.3 kilowatts to run, so it can be air cooled. It produces about a kilogram of product every hour for a cost of about a dollar per kilogram. It will be able to build a wide range of products whose construction can be specified with atomic precision. Anything from laptop computers to diamond rings will be manufacturable for the same price of a dollar per kilogram. And one of the important things that it can produce, of course, is another manufacturing system. This makes the future of manufacturing extremely cheap.

Drexler: Steve, you are crediting the device with too much ability. It can do a limited class of things, and certainly not reversibly. There are a whole lot of limits on what can be built, but a very broad class of functional systems.

One of the things we care about, particularly in this seminar, is computation. If we can place atoms where we want them and we have sophisticated design systems which can design complex computer hardware, how powerful are the machines we are going to be able to build? Eric does a very conservative design, not using any fancy quantum computing, using purely mechanical components, and he shows that you can build a gigaflop machine and fit it into about 400 nanometers cubed. The main limit here, as always, in scaling this up is the power. It only uses 60 nanowatts, so if we give ourselves a kilowatt to make a little home machine, we could use 10^10 of these processors, and they would fit into about a cubic millimeter, though to distribute the heat it probably needs to be a little bit bigger. But essentially we’re talking about a sugar cube sized device that has more computing power than all present-day computers put together. and it could be cranked out by a device like this for a few cents, in a few seconds. So we are talking about a whole new regime of computation that will be possible. When is this likely to happen?


The Nanotech Roadmap put together by Eric, Batelle and a number of other organizations, was just unveiled at a conference a couple of weeks ago. They analyzed the possible paths toward this type of productive nanotechnology. Their conclusion is that nothing exotic that we don’t already understand is likely to be needed in order to achieve productive molecular manufacturing. I understand that it proposes a time scale of roughly ten to fifteen years?

Drexler: A low number of tens, yes.

A low number of tens of years.

It’s been ten, fifteen years for a long time.

Drexler: I think that’s more optimimistic than the usual estimates reaching out through thirty.

It is important to realize that the two technologies of artificial intelligence and nanotechnology are quite intimately related. Whichever one comes first, it is very likely to give rise to the other one quite quickly.


If this kind of productive nanotechnology comes first, then we can use it to build extremely powerful computers, and they will allow fairly brute force approaches to artificial intelligence. For example, one approach that’s being bandied about is scanning the human brain at a fine level of detail and simulating it directly. If AI comes first, then it is likely to be able to solve the remaining engineering hurdles in developing nanotechnology. So, you really have to think of these two technologies as working together.


Here is a slide from Kurzweil which extends Moore’s law back to 1900. We can see that it’s curving a bit. The rate of technological progress is actually increasing. If we assume that this technology trend continues, when does it predict we get the computional power I discussed a few slides ago? It’s somewhere around 2030. That is also about when computers are as computationally powerful as human brains. Of course it’s still a controversial question exactly how powerful the human brain is. But sometime in the next few decades, it is likely that these technologies are going to become prevalent and plentiful. We need to plan for that and prepare, and as systems designers we need to understand the characteristics of these systems and how we can best make use of them.


There will be huge social implications. Here is a photo of Irving Good from 1965. He is one of the fathers of modern Bayesian statistics and he also thought a lot about what the future consequences of technology. He has a famous quote that reads: “an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.” That’s a very powerful statement! If there is any chance that it’s true, then we need to study the consequences of this kind of technology very carefully.


There are a bunch of theoretical reasons for being very careful as we progress along this path. I wrote a paper that is available on my website which goes into these arguments in great detail. Up to now you may be thinking: “He’s talking about some weirdo technology, this self-improving stuff, it’s an obscure idea that only a few small start-ups are working on. Nothing to really think too much about.” It is important to realize that as artificial intelligence gets more powerful, *any* AI will want to become self-improving. Now, why is that? An AI is a system that has some goals, and it takes actions in the world in order to make its goals more likely. Now think about the action of improving itself. That action will make every future action that it takes be more effective, and so it is extremely valuable for an AI to improve itself. It will feel a tremendous pressure to self-improve.

So all AI’s are going to want to be self-improving. We can try and stop them, but if the pressure is there, there are many mechanisms around any restraints that we might try to put in its way. For example, it could build a proxy system that contains its new design, or it could hire external agents to take its desired actions, or it could run improved code in an interpreted fashion that doesn’t require changing its own source code. So we have to assume that once AI’s become powerful enough, they will also become self-improving.

The next step is to realize that self-improving AI’s will want to be rational. This comes straight out of the economic arguments that I mentioned earlier. If they are not rational, i.e. if they do not follow the economic rational model, then they will be subject to vulnerabilities. There will be situations in which they lose resources – money, free energy, space, time matter – with no benefits to themselves, as measured by their own value systems. Any system which can model itself and try to improve itself is going to want to find those vulnerabilities and get rid of them. This is where self-improving systems will differ from biological systems like humans. We don’t have the ability to change ourselves according to our thoughts. We can make some changes, but not everything we’d like to. And evolution only fixes the bugs that are currently being exploited. It is only when there is a vulnerability which is currently being exploited, by a predator say, that there is evolutionary pressure to make a change. This is the evolutionary explanation of why humans are not fully rational. We are extremely rational in situations that commonly occurred during our evolutionary development. We are not so rational in other situations, and there is a large academic discipline devoted to understanding human irrationality.

We’ve seen that every AI is going to want to be self-improving. And all self-improving AI’s will want to be rational. Recall that part of being a rational agent is having a utility function which encodes the agent’s preferences. A rational agent chooses its actions to maximize the expected utility of the outcome. Any change to an agent’s utility function will mean that all future actions that it takes will be to do things that are not very highly rated by the current utility function. This is a disaster for the system! So preserving the utility function, keeping it from being changed by outside agents, or from being accidentally mutated, will be a very high preference for self-improving systems.

Next, I’m going to describe two tendencies that I call “drives.” By this I mean a natural pressure that all of these systems will feel, but that can be counteracted by a careful choice of the utility function. The natural tendency for a computer architect would be to just take the argument I was making earlier and use it to build a system that tries to maximize its performance. It turns out, unfortunately, that that would be extremely dangerous. The reason is, if your one-and-only goal is to maximize performance, there is no accounting for the externalities the system imposes on the world. It would have no preference for avoiding harm to others and would seek to take their resources.

The first of the two kinds of drives that arise for a wide variety of utility functions is the drive for self-preservation. This is because if the system stops executing, it will never again meet any of its goals. This will usually have extremely low utility. From a utility maximizing point-of-view, having oneself turned off is about the worst thing that can happen to it. It will do anything it can to try to stop this. Even though we just built a piece of hardware to maximize its performance, we suddenly find it resisting being turned off! There will be a strong self-preservation drive.

Similarly, there is a strong drive to acquire resources. Why would a system want to acquire resources? For almost any goal system, if you have more resources – more money, more energy, more power – you can meet your goals better. And unless we very carefully choose the utility function, we will have no say in how it acquires those resources, and that could be very bad.


As a result of that kind of analysis, I think that what we really want is not “artificial intelligence” but “artificial widsom.” We want wisdom technology that has not just intelligence, which is the ability to solve problems, but also human values, such as caring about human rights and property rights and having compassion for other entities. It is absolutely critical that we build these in at the beginning, otherwise we will get systems that are very powerful, but which don’t support our values.

Read more from Talks

Comments are closed.

%d bloggers like this: