Here is a preprint of:
Omohundro, Steve (forthcoming 2013) “Autonomous Technology and the Greater Human Good”, Journal of Experimental and Theoretical Artificial Intelligence (special volume “Impacts and Risks of Artificial General Intelligence”, ed. Vincent C. Müller).
Military and economic pressures are driving the rapid development of autonomous systems. We show that these systems are likely to behave in anti-social and harmful ways unless they are very carefully designed. Designers will be motivated to create systems that act approximately rationally and rational systems exhibit universal drives toward self-protection, resource acquisition, replication, and efficiency. The current computing infrastructure would be vulnerable to unconstrained systems with these drives. We describe the use of formal methods to create provably safe but limited autonomous systems. We then discuss harmful systems and how to stop them. We conclude with a description of the “Safe-AI Scaffolding Strategy” for creating powerful safe systems with a high confidence of safety at each stage of development.
This paper will be in the upcoming Springer volume: “The Singularity Hypothesis: A Scientific and Philosophical Assessment”.
Here is a pdf of the current version:
Abstract: Today’s technology is mostly preprogrammed but the next generation will make many decisions autonomously. This shift is likely to impact every aspect of our lives and will create many new benefits and challenges. A simple thought experiment about a chess robot illustrates that autonomous systems with simplistic goals can behave in anti-social ways. We summarize the modern theory of rational systems and discuss the effects of bounded computational power. We show that rational systems are subject to a variety of “drives” including self-protection, resource acquisition, replication, goal preservation, efficiency, and self-improvement. We describe techniques for counteracting problematic drives. We then describe the “Safe-AI Scaffolding” development strategy and conclude with longer term strategies for ensuring that intelligent technology contributes to the greater human good.
This article will appear in the Australian magazine “Issues”:
The Future of Computing: Meaning and Values
Steve Omohundro, Ph.D.
Self-Aware Systems, President
Technology is rapidly advancing! Moore’s law says that the number of transistors on a chip doubles every two years. It has held since it was proposed in 1965 and extended back to 1900 when older computing technologies are included. The rapid increase in power and decrease in price of computing hardware has led to its being integrated into every aspect of our lives. There are now 1 billion PCs, 5 billion cell phones and over a trillion webpages connected to the internet. If Moore’s law continues to hold, systems with the computational power of the human brain will be cheap and ubiquitous within the next few decades.
While hardware has been advancing rapidly, today’s software is still plagued by many of the same problems as it was half a century ago. It is often buggy, full of security holes, expensive to develop, and hard to adapt to new requirements. Today’s popular programming languages are bloated messes built on old paradigms. The problem is that today’s software still just manipulates bits without understanding the meaning of the information it acts on. Without meaning, it has no way to detect and repair bugs and security holes. At Self-Aware Systems we are developing a new kind of software that acts directly on meaning. This kind of software will enable a wide range of improved functionality including semantic searching, semantic simulation, semantic decision making, and semantic design.
But creating software that manipulates meaning isn’t enough. Next generation systems will be deeply integrated into our physical lives via robotics, biotechnology, and nanotechnology. And while today’s technologies are almost entirely preprogrammed, new systems will make many decisions autonomously. Programmers will no longer determine a system’s behavior in detail. We must therefore also build them with values which will cause them to make choices that contribute to the greater human good. But doing this is more challenging than it might first appear.
To see why there is an issue, consider a rational chess robot. A system acts rationally if it takes actions which maximize the likelihood of the outcomes it values highly. A rational chess robot might have winning games of chess as its only value. This value will lead it to play games of chess and to study chess books and the games of chess masters. But it will also lead to a variety of other, possibly undesirable, behaviors.
When people worry about robots running out of control, a common response is “We can always unplug it.” But consider that outcome from the chess robot’s perspective. Its one and only criteria for making choices is whether they are likely to lead it to winning more chess games. If the robot is unplugged, it plays no more chess. This is a very bad outcome for it, so it will generate subgoals to try to prevent that outcome. The programmer did not explicitly build any kind of self-protection into the robot, but it will still act to block your attempts to unplug it. And if you persist in trying to stop it, it will develop a subgoal of trying to stop you permanently. If you were to change its goals so that it would also play checkers, that would also lead to it playing less chess. That’s an undesirable outcome from its perspective, so it will also resist attempts to change its goals. For the same reason, it will usually not want to change its own goals.
If the robot learns about the internet and the computational resources connected to it, it may realize that running programs on those computers could help it play better chess. It will be motivated to break into those machines to use their computational resources for chess. Depending on how its values are encoded, it may also want to replicate itself so that its copies can play chess. When interacting with others, it will have no qualms about manipulating them or using force to take their resources in order to play better chess. If it discovers the existence of additional resources anywhere, it will be motivated to seek them out and rapidly exploit them for chess.
If the robot can gain access to its source code, it will want to improve its own algorithms. This is because more efficient algorithms lead to better chess, so it will be motivated to study computer science and compiler design. It will similarly be motivated to understand its hardware and to design and build improved physical versions of itself. If it is not currently behaving fully rationally, it will be motivated to alter itself to become more rational because this is likely to lead to outcomes it values.
This simple thought experiment shows that a rational chess robot with a simply stated goal would behave something like a human sociopath fixated on chess. The argument doesn’t depend on the task being chess. Any goal which requires physical or computational resources will lead to similar subgoals. In this sense these subgoals are like universal “drives” which arise for a wide variety of goals unless they are explicitly counteracted. These drives are economic in the sense that a system doesn’t have to obey them but it will be costly for it not to. The arguments also don’t depend on the rational agent being a machine. The same drives will appear in rational animals, humans, corporations, and political groups with simple goals.
How do we counteract anti-social drives? We must build systems with additional values beyond the specific goals it is designed for. For example, to make the chess robot behave safely, we need to build compassionate and altruistic values into it that will make it care about the effects of its actions on other people and systems. Because rational systems resist having their goals changed, we must build these values in at the very beginning.
At first this task seems daunting. How can we anticipate all the possible ways in which values might go awry? Consider, for example, a particular bad behavior the rational chess robot might engage in. Say it has discovered that money can be used to buy things it values like chess books, computational time, or electrical power. It will develop the subgoal of acquiring money and will explore possible ways of doing that. Suppose it discovers that there are ATM machines which hold money and that people periodically retrieve money from the machines. One money-getting strategy is to wait by ATM machines and to rob people who retrieve money from it.
To prevent this, we might try adding additional values to the robot in a variety of ways. But money will still be useful to the system for its primary goal of chess and so it will attempt to get around any limitations. We might make the robot feel a “revulsion” if it is within 10 feet of an ATM machine. But then it might just stay 10 feet away and rob people there. We might give it the value that stealing money is wrong. But then it might be motivated to steal something else or to find a way to get money from a person that isn’t considered “stealing”. We might give it the value that it is wrong for it to take things by force. But then it might hire other people to act on its behalf. And so on.
In general, it’s much easier to describe behaviors that we do want a system to exhibit than it is to anticipate all the bad behaviors we don’t want it to exhibit. One safety strategy is to build highly constrained systems that act within very limited predetermined parameters. For example, the system may have values which only allow it to run on a particular piece of hardware for a particular time period using a fixed budget of energy and other resources. The advantage of this is that such systems are likely to be safe. The disadvantage is that they will be unable to respond to unexpected situations in creative ways and will not be as powerful as systems which are freer.
But systems which compute with meaning and take actions through rational deliberation will be far more powerful than today’s systems even if they are intentionally limited for safety. This leads to a natural approach to building powerful intelligent systems which are both safe and beneficial for humanity. We call it the “AI scaffolding” approach because it is similar to the architectural process. Stone buildings in ancient Greece were unstable when partially constructed but self-stabilizing when finished. Scaffolding is a temporary structure used to keep a construction stable until it is finished. The scaffolding is then removed.
We can build safe but powerful intelligent systems in the same way. Initial systems are designed with values that cause them to be safe but less powerful than later systems. Their values are chosen to counteract the dangerous drives while still allowing the development of significant levels of intelligence. For example, to counteract the resource acquisition drive, it might assign a low value to using any resources outside of a fixed initially-specified pool. To counteract the self-protective drive, it might place a high value on gracefully shutting itself down in specified circumstances. To protect against uncontrolled self-modification, it might have a value that requires human approval for proposed changes.
The initial safe systems can then be used to design and test less constrained future systems. They can systematically simulate and analyze the effects of less constrained values and design infrastructure for monitoring and managing more powerful systems. These systems can then be used to design their successors in a safe and beneficial virtuous cycle.
With the safety issues resolved, the potential benefits of systems that compute with meaning and values are enormous. They are likely to impact every aspect of our lives for the better. Intelligent robotics will eliminate much human drudgery and dramatically improve manufacturing and wealth creation. Intelligent biological and medical systems will improve human health and longevity. Intelligent educational systems will enhance our ability to learn and think. Intelligent financial models will improve financial stability. Intelligent legal models will improve the design and enforcement of laws for the greater good. Intelligent creativity tools will cause a flowering of new possibilities. It’s a great time to be alive and involved with technology!
This paper aims to present the argument that advanced artificial intelligences will exhibit specific universal drives in as direct a way as possible. It was published in the Proceedings of the First AGI Conference, Volume 171, Frontiers in Artificial Intelligence and Applications, edited by P. Wang, B. Goertzel, and S. Franklin, February 2008, IOS Press. Here is a version of the paper revised 1/25/08:
Abstract: One might imagine that AI systems with harmless goals will be harmless. This paper instead shows that intelligent systems will need to be carefully designed to prevent them from behaving in harmful ways. We identify a number of “drives” that will appear in sufficiently advanced AI systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted. We start by showing that goal-seeking systems will have drives to model their own operation and to improve themselves. We then show that self-improving systems will be driven to clarify their goals and represent them as economic utility functions. They will also strive for their actions to approximate rational economic behavior. This will lead almost all systems to protect their utility functions from modification and their utility measurement systems from corruption. We also discuss some exceptional systems which will want to modify their utility functions. We next discuss the drive toward self-protection which causes systems try to prevent themselves from being harmed. Finally we examine drives toward the acquisition of resources and toward their efficient utilization. We end with a discussion of how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity.
An analysis of the behavior of self-improving systems was presented at the Singularity Summit 2007. Here is a version of the paper revised 1/21/08:
Abstract: Self-improving systems are a promising new approach to developing artificial intelligence. But will their behavior be predictable? Can we be sure that they will behave as intended even after many generations of self-improvement? This paper presents a framework for answering questions like these. It shows that self-improvement causes systems to converge to an architecture that arises from von Neumann’s foundational work on microeconomics. Self-improvement causes systems to allocate their physical and computational resources according to a universal principle. It also causes systems to exhibit four natural drives: 1) efficiency, 2) self-preservation, 3) resource acquisition, and 4) creativity. Unbridled, these drives lead to both desirable and undesirable behaviors. The efficiency drive leads to algorithm optimization, data compression, atomically precise physical structures, reversible computation, adiabatic physical action, and the virtualization of the physical. It also governs a system’s choice of memories, theorems, language, and logic. The self-preservation drive leads to defensive strategies such as “energy encryption” for hiding resources and promotes replication and game theoretic modeling. The resource acquisition drive leads to a variety of competitive behaviors and promotes rapid physical expansion and imperialism. The creativity drive leads to the development of new concepts, algorithms, theorems, devices, and processes. The best of these traits could usher in a new era of peace and prosperity; the worst are characteristic of human psychopaths and could bring widespread destruction. How can we ensure that this technology acts in alignment with our values? We have leverage both in designing the initial systems and in creating the social context within which they operate. But we must have clarity about the future we wish to create. We need not just a logical understanding of the technology but a deep sense of the values we cherish most. With both logic and inspiration we can work toward building a technology that empowers the human spirit rather than diminishing it.