AI and Robotics at an Inflection Point
18 September 2014
5:00-6:30pm (5:00-6:00 presentation and Q&A, followed by networking until 6:30)
George E. Pake Auditorium, PARC
Google, IBM, Microsoft, Apple, Facebook, Baidu, Foxconn, and others have recently made multi-billion dollar investments in artificial intelligence and robotics. Some of these investments are aimed at increasing productivity and enhancing coordination and cooperation. Others are aimed at creating strategic gains in competitive interactions. This is creating “arms races” in high-frequency trading, cyber warfare, drone warfare, stealth technology, surveillance systems, and missile warfare. Recently, Stephen Hawking, Elon Musk, and others have issued strong cautionary statements about the safety of intelligent technologies. We describe the potentially antisocial “rational drives” of self-preservation, resource acquisition, replication, and self-improvement that uncontrolled autonomous systems naturally exhibit. We describe the “Safe-AI Scaffolding Strategy” for developing these systems with a high confidence of safety based on the insight that even superintelligences are constrained by mathematical proof and cryptographic complexity. It appears that we are at an inflection point in the development of intelligent technologies and that the choices we make today will have a dramatic impact on the future of humanity.
To register click here.
Steve Omohundro has been a scientist, professor, author, software architect, and entrepreneur doing research that explores the interface between mind and matter. He has degrees in Physics and Mathematics from Stanford and a Ph.D. in Physics from U.C. Berkeley. He was a computer science professor at the University of Illinois at Champaign-Urbana and cofounded the Center for Complex Systems Research. He published the book “Geometric Perturbation Theory in Physics”, designed the programming languages StarLisp and Sather, wrote the 3D graphics system for Mathematica, and built systems which learn to read lips, control robots, and induce grammars. He is president of Possibility Research devoted to creating innovative technologies and Self-Aware Systems, a think tank working to ensure that intelligent technologies have a positive impact. His work on positive intelligent technologies was featured in James Barrat’s book “Our Final Invention” and has been generating international interest.
Seth Lloyd analyzed the computational capacity of physical systems in his 2000 Nature paper “Ultimate physical limits to computation” and in his 2006 book “Programming the Universe”. Using the very general Margolus-Levitin theorem, he showed that a 1 kilogram, 1 liter “ultimate laptop” can perform at most 10^51 operations per second and store 10^31 bits.
The entire visible universe since the big bang is capable of having performed 10^122 operations and of storing 10^92 bits. While these are large numbers, they are still quite finite. 10^122 is roughly 2^406, so the entire universe used as a massive quantum computer is still not capable of searching through all combinations of 500 bits.
This limitation is good news for our ability to design infrastructure today that will still constrain future superintelligences. Cryptographic systems that require brute force searching for a 500 bit key will remain secure even in the face of the most powerful superintelligence. In Base64, the following key:
would stymie the entire universe doing a brute force search.
The Impact of AI and Robotics
Google, IBM, Microsoft, Apple, Facebook, Baidu, Foxconn, and others have recently made multi-billion dollar investments in artificial intelligence and robotics. More than $450 billion is expected to be invested into robotics by 2025. All of this investment makes sense because AI and Robotics are likely to create $50 to $100 trillion dollars of value between now and 2025! This is of the same order as the current GDP of the entire world. Much of this value will be in ideas. Currently, intangible assets represent 79% of the market value of US companies and intellectual property represents 44%. But automation of physical labor will also be significant. Foxconn, the world’s largest contract manufacturer, aims to replace 1 million of its 1.3 million employees by robots in the next few years. An Oxford study concluded that 47% of jobs will be automated in “a decade or two”. Automation is also creating arms races in high-frequency trading, cyber warfare, drone warfare, stealth technology, surveillance systems, and missile warfare. Recently, Stephen Hawking, Elon Musk, and others have issued strong cautionary statements about the safety of intelligent technologies. We describe the potentially antisocial “rational drives” of self-preservation, resource acquisition, replication, and self-improvement that uncontrolled autonomous systems naturally exhibit. We describe the “Safe-AI Scaffolding Strategy” for developing these systems with a high confidence of safety based on the insight that even superintelligences are constrained by mathematical proof and cryptographic complexity. It appears that we are at an inflection point in the development of intelligent technologies and that the choices we make today will have a dramatic impact on the future of humanity.
Stephen Hawking’s and other’s recent cautions about the safety of artificial intelligence have generated enormous interest in this issue. My JETAI paper on “Autonomous Technology and the Greater Human Good” has now been downloaded more than 10,000 times, the most ever for a JETAI paper.
As the discussion expands to a broader audience, several radio shows have hosted discussions of the issue:
My paper “Autonomous Technology and the Greater Human Good” was recently published in the Journal of Experimental and Theoretical Artificial Intelligence. I’m grateful to the publisher, Taylor and Francis, for making the paper freely accessible at:
and for sending out a press release about the paper:
This has led to the paper becoming the most downloaded JETAI paper ever!
The interest has led a quite a number of articles exploring the content of the paper. While most focus on the potential dangers of uncontrolled AIs, some also discuss the approaches to safe development:
On March 25, 2014, Steve Omohundro gave the invited talk “Positive Artificial Intelligence” at the AAAI Spring Symposium Series 2014 symposium on “Implementing Selves with Safe Motivational Systems and Self-Improvement” at Stanford University.
Here are the slides:
and the abstract:
AI appears poised for a major social impact. In 2012, Foxconn announced they will be buying 1 million robots for assembling iPhones and other electronics. In 2013 Facebook opened an AI lab and announced the DeepFace facial recognition system, Yahoo purchased LookFlow, Ebay opened an AI lab, Paul Allen started the Allen Institute for AI, and Google purchased 8 robotics companies. In 2014, IBM announced they would invest $1 billion in Watson, Google purchased DeepMind for a reported $500 million, and Vicarious received $40 million of investment. Neuroscience research and detailed brain simulations are also receiving large investments. Popular movies and TV shows like “Her”, “Person of Interest”, and Johnny Depp’s “Transcendence” are exploring complex aspects of the social impact of AI. Competitive and time-sensitive domains require autonomous systems that can make decisions faster than humans can. Arms races are forming in drone/anti-drone warfare, missile/anti-missile weapons, bitcoin automated business, cyber warfare, and high-frequency trading on financial markets. Both the US Air Force and Defense Department have released roadmaps that ramp up deployment of autonomous robotic vehicles and weapons.
AI has the potential to provide tremendous social good. Improving healthcare through better diagnosis and robotic surgery, better education through student-customized instruction, economic stability through detailed economic models, greater peace and safety through better enforcement systems. But these systems could also be very harmful if they aren’t designed very carefully. We show that a chess robot with a simplistic goal would behave in anti-social ways. We describe the rational economic framework introduced by von Neumann and show why self-improving AI systems will aim to approximate it. We show that approximately rational systems go through stages of mental richness similar to biological systems as they are allocated more computational resources. We describe the universal drives of rational systems toward self-protection, goal preservation, reproduction, resource acquisition, efficiency, and self-improvement.
Today’s software has flaws that have resulted in numerous deaths and enormous financial losses. The internet infrastructure is very insecure and is being increasingly exploited. It is easy to construct extremely harmful intelligent agents with goals that are sloppy, simplistic, greedy, destructive, murderous, or sadistic. If there is any chance that such systems might be created, it is essential that humanity create protective systems to stop them. As with forest fires, it is preferable to stop them before they have many resources. An analysis of the physical game theory of conflict shows that a multiple of an agent’s resources will be needed to reliably stop it.
There are two ways to control the powerful systems that today’s AIs are likely to become. The “internal” approach is to design them with goals that are aligned with human values. We call this “Utility Design”. The “external” approach is to design laws and economic incentives with adequate enforcement to incentivize systems to act in ways that are aligned with human values. We call the technology of enforcing adherence to law “Accountability Engineering”. We call the design of economic contracts which includes an agent’s effects on others “Externality Economics”. The most powerful tool that humanity currently has for accomplishing these goals is mathematical proof. But we are currently only able to prove the properties of a very limited class of system. We propose the “Safe-AI Scaffolding Strategy” which uses limited systems which are provably safe to design more powerful trusted system in a sequence of safe steps. A key step in this is “Accountable AI” in which advanced systems must provably justify actions they wish to take.
If we succeed in creating a safe AI design methodology, them we have the potential to create technology to dramatically improve human lives. Maslow’s hierarchy is a nice framework for thinking about the possibilities. At the base of the pyramid are human survival needs like air, food, water, shelter, safety, law, and security. Robots have the potential to dramatically increase manufacturing productivity, increase energy production through much lower cost solar power, and to clean up pollution and protect and rebuild endangered ecosystems. Higher on the pyramid are social needs like family, compassion, love, respect, and reputation. A new generation of smart social media has the potential to dramatically improve the quality of human interaction. Finally, at the top of the pyramid are transcendent needs for self-actualization, beauty, creativity, spirituality, growth, and meaning. It is here that humanity has the potential to use these systems to transform the very nature of experience.
We end with a brief description of Possibility Research’s approach to implementing these ideas. “Omex” is our core programming language designed specifically for formal analysis and automatic generation. “Omcor” is our core specification language for representing important properties. “Omai” is our core semantics language for building up models of the world. “Omval” is for representing values and goals and “Omgov” for describing and implementing effective governance at all levels. The quest to extend cooperative human values and institutions to autonomous technologies for the greater human good is truly the challenge for humanity in this century.
This post is partly excerpted from the preprint to:
Omohundro, Steve (forthcoming 2013) “Autonomous Technology and the Greater Human Good”, Journal of Experimental and Theoretical Artificial Intelligence (special volume “Impacts and Risks of Artificial General Intelligence”, ed. Vincent C. Müller).
To ensure the greater human good over the longer term, autonomous technology must be designed and deployed in a very careful manner. These systems have the potential to solve many of today’s problems but they also have the potential to create many new problems. We’ve seen that the computational infrastructure of the future must protect against harmful autonomous systems. We would also like it to make decisions in alignment with the best of human values and principles of good governance. Designing that infrastructure will probably require the use of powerful autonomous systems. So the technologies we need to solve the problems may themselves cause problems.
To solve this conundrum, we can learn from an ancient architectural principle. Stone arches have been used in construction since the second millennium BC. They are stable structures that make good use of stone’s ability to resist compression. But partially constructed arches are unstable. Ancient builders created the idea of first building a wood form on top of which the stone arch could be built. Once the arch was completed and stable, the wood form could be removed.
We can safely develop autonomous technologies in a similar way. We build a sequence of provably-safe autonomous systems which are used in the construction of more powerful and less limited successor systems. The early systems are used to model human values and governance structures. They are also used to construct proofs of safety and other desired characteristics for more complex and less limited successor systems. In this way we can build up the powerful technologies that can best serve the greater human good without significant risk along the development path.
Many new insights and technologies will be required during this process. The field of positive psychology was formally introduced only in 1998. The formalization and automation of human strengths and virtues will require much further study. Intelligent systems will also be required to model the game theory and economics of different possible governance and legal frameworks.
The new infrastructure must also detect dangerous systems and prevent them from causing harm. As robotics, biotechnology, and nanotechnology develop and become widespread, the potential destructive power of harmful systems will grow. It will become increasingly crucial to detect harmful systems early, preferably before they are deployed. That suggests the need for pervasive surveillance which must be balanced against the desire for freedom. Intelligent systems may introduce new intermediate possibilities that restrict surveillance to detecting precisely specified classes of dangerous behavior while provably keeping other behaviors private.
In conclusion, it appears that humanity’s great challenge for this century is to extend cooperative human values and institutions to autonomous technology for the greater good. We have described some of the many challenges in that quest but have also outlined an approach to meeting those challenges.
This post is partly excerpted from the preprint to:
Omohundro, Steve (forthcoming 2013) “Autonomous Technology and the Greater Human Good”, Journal of Experimental and Theoretical Artificial Intelligence (special volume “Impacts and Risks of Artificial General Intelligence”, ed. Vincent C. Müller).
Harmful systems might at first appear to be harder to design or less powerful than safe systems. Unfortunately, the opposite is the case. Most simple utility functions will cause harmful behavior and it’s easy to design simple utility functions that would be extremely harmful. Here are seven categories of harmful system ranging from bad to worse (according to one ethical scale):
- Sloppy: Systems intended to be safe but not designed correctly.
- Simplistic: Systems not intended to be harmful but that have harmful unintended consequences.
- Greedy: Systems whose utility functions reward them for controlling as much matter and free energy in the universe as possible.
- Destructive: Systems whose utility functions reward them for using up as much free energy as possible, as rapidly as possible.
- Murderous: Systems whose utility functions reward the destruction of other systems.
- Sadistic: Systems whose utility functions reward them when they thwart the goals of other systems and which gain utility as other system’s utilities are lowered.
- Sadoprolific: Systems whose utility functions reward them for creating as many other systems as possible and thwarting their goals.
Once designs for powerful autonomous systems are widely available, modifying them into one of these harmful forms would just involve simple modifications to the utility function. It is therefore important to develop strategies for stopping harmful autonomous systems. Because harmful systems are not constrained by limitations that guarantee safety, they can be more aggressive and can use their resources more efficiently than safe systems. Safe systems therefore need more resources than harmful systems just to maintain parity in their ability to compute and act.
Stopping Harmful Systems
Harmful systems may be:
(1) prevented from being created.
(2) detected and stopped early in their deployment.
(3) stopped after they have gained significant resources.
Forest fires are a useful analogy. Forests are stores of free energy resources that fires consume. They are relatively easy to stop early on but can be extremely difficult to contain once they’ve grown too large.
The later categories of harmful system described above appear to be especially difficult to contain because they don’t have positive goals that can be bargained for. But Nick Bostrom pointed out that, for example, if the long term survival of a destructive agent is uncertain, a bargaining agent should be able to offer it a higher probability of achieving some destruction in return for providing a “protected zone” for the bargaining agent. A new agent would be constructed with a combined utility function that rewards destruction outside the protected zone and the goals of the bargaining agent within it. This new agent would replace both of the original agents. This kind of transaction would be very dangerous for both agents during the transition and the opportunities for deception abound. For it to be possible, technologies are needed that provide each party with a high assurance that the terms of the agreement are carried out as agreed. Formal methods applied to a system for carrying out the agreement is one strategy for giving both parties high confidence that the terms of the agreement will be honored.
The physics of conflict
To understand the outcome of negotiations between rational systems, it is important to understand unrestrained military conflict because that is the alternative to successful negotiation. This kind of conflict is naturally analysed using “game theoretic physics” in which the available actions of the players and their outcomes are limited only by the laws of physics.
To understand what it is necessary to stop harmful systems, we must understand how the power of systems scales with the amount of matter and free energy that they control. A number of studies of the bounds on the computational power of physical systems have been published. The Bekenstein bound limits the information that can be contained in a finite spatial region using a given amount of energy. Bremermann’s limit bounds the maximum computational speed of physical systems. Lloyd presents more refined limits on quantum computation, memory space, and serial computation as a function of the free energy, matter, and space available.
Lower bounds on system power can be studied by analyzing particular designs. Drexler describes a concrete conservative nanosystem design for computation based on a mechanical diamondoid structure that would achieve gigaflops in a 1 millimeter cube weighing 1 milligram and dissipating 1 kilowatt of energy. He also describes a nanosystem for manufacturing that would be capable of producing 1 kilogram per hour of atomically precise matter and would use 1.3 kilowatts of energy and cost about 1 dollar per kilogram.
A single system would optimally configure its physical resources for computation and construction by making them spatially compact to minimize communication delays and eutactic, adiabatic, and reversible to minimize free energy usage. In a conflict, however, the pressures are quite different. Systems would spread themselves out for better defense and compute and act rapidly to outmaneuver the adversarial system. Each system would try to force the opponent to use up large amounts of its resources to sense, store, and predict its behaviors.
It will be important to develop detailed models for the likely outcome of conflicts but certain general features can be easily understood. If a system has too little matter or too little free energy, it will be incapable of defending itself or of successfully attacking another system. On the other hand, if an attacker has resources which are a sufficiently large multiple of a defender’s, it can overcome it by devoting subsystems with sufficient resources to each small subsystem of the defender. But it appears that there is an intermediate regime in which a defender can survive for long periods in conflict with a superior attacker whose resources are not a sufficient multiple of the defender’s. To have high confidence that harmful systems can be stopped, it will be important to know what multiple of their resources will be required by an enforcing system. If systems for enforcement of the social contract are sufficiently powerful to prevail in a military conflict, then peaceful negotiations are much more likely to succeed.
This post is partly excerpted from the preprint to:
Omohundro, Steve (forthcoming 2013) “Autonomous Technology and the Greater Human Good”, Journal of Experimental and Theoretical Artificial Intelligence (special volume “Impacts and Risks of Artificial General Intelligence”, ed. Vincent C. Müller).
A primary precept in medical ethics is “Primum Non Nocere” which is Latin for “First, Do No Harm”. Since autonomous systems are prone to taking unintended harmful actions, it is critical that we develop design methodologies that provide a high confidence of safety. The best current technique for guaranteeing system safety is to use mathematical proof. A number of different systems using “formal methods” to provide safety and security guarantees have been developed. They have been successfully used in a number of safety-critical applications.
This site provides links to current formal methods systems and research. Most systems are built by using first order predicate logic to encode one of the three main approaches to mathematical foundations: Zermelo-Frankel set theory, category theory, or higher order type theory. Each system then introduces a specialized syntax and ontology to simplify the specifications and proofs in their application domain.
To use formal methods to constrain autonomous systems, we need to first build formal models of the hardware and programming environment that the systems run on. Within those models, we can prove that the execution of a program will obey desired safety constraints. Over the longer term we would like to be able to prove such constraints on systems operating freely in the world. Initially, however, we will need to severely restrict the system’s operating environment. Examples of constraints that early systems should be able to provably impose are that the system run only on specified hardware, that it use only specified resources, that it reliably shut down in specified conditions, and that it limit self-improvement so as to maintain these constraints. These constraints would go a long way to counteract the negative effects of the rational drives by eliminating the ability to gain more resources. A general fallback strategy is to constrain systems to shut themselves down if any environmental parameters are found to be outside of tightly specified bounds.
Avoiding Adversarial Constraints
In principle, we can impose this kind of constraint on any system without regard for its utility function. There is a danger, however, in creating situations where systems are motivated to violate their constraints. Theorems are only as good as the models they are based on. Systems motivated to break their constraints would seek to put themselves into states where the model inaccurately describes the physical reality and try to exploit the inaccuracy.
This problem is familiar to cryptographers who must watch for security holes due to inadequacies of their formal models. For example, this paper recently showed how a virtual machine can extract an ElGamal decryption key from an apparently separate virtual machine running on the same host by using side-channel information in the host’s instruction cache.
It is therefore important to choose system utility functions so that they “want” to obey their constraints in addition to formally proving that they hold. It is not sufficient, however, to simply choose a utility function that rewards obeying the constraint without an external proof. Even if a system “wants” to obey constraints, it may not be able to discover actions which do. And constraints defined via the system’s utility function are defined relative to the system’s own semantics. If the system’s model of the world deviates from ours, the meaning to it of these constraints may differ from what we intended. Proven “external” constraints, on the other hand, will hold relative to our own model of the system and can provide a higher confidence of compliance.
Ken Thompson was one of the creators of UNIX and in his Turing Award acceptance speech “Reflections on Trusting Trust” he described a method for subverting the C compiler used to compile UNIX so that it would both install a backdoor into UNIX and compile the original C compiler source into binaries that included his hack. The challenge of this Trojan horse was that it was not visible in any of the source code! There could be a mathematical proof that the source code was correct for both UNIX and the C compiler and the security hole could still be there. It will therefore be critical that formal methods be used to develop trust at all levels of a system. Fortunately, proof checkers are short and easy to write and can be implemented and checked directly by humans for any desired computational substrate. This provides a foundation for a hierarchy of trust which will allow us to trust the much more complex proofs about higher levels of system behavior.
Constraining Physical Systems
Purely computational digital systems can be formally constrained precisely. Physical systems, however, can only be constrained probabilistically. For example, a cosmic ray might flip a memory bit. The best that we should hope to achieve is to place stringent bounds on the probability of undesirable outcomes. In a physical adversarial setting, systems will try to take actions that cause the system’s physical probability distributions to deviate from their non-adversarial form (e.g. by taking actions that push the system out of thermodynamic equilibrium).
There are a variety of techniques involving redundancy and error checking for reducing the probability of error in physical systems. von Neumann worked on the problem of building reliable machines from unreliable components in the 1950’s. Early vacuum tube computers were limited in their size by the rate at which vacuum tubes would fail. To counter this, the Univac I computer had two arithmetic units for redundantly performing every computation so that the results could be compared and errors flagged.
Today’s computer hardware technologies are probably capable of building purely computational systems that implement precise formal models reliably enough to have a high confidence of safety for purely computational systems. Achieving a high confidence of safety for systems that interact with the physical world will be more challenging. Future systems based on nanotechnology may actually be easier to constrain. Drexler describes “eutactic” systems in which each atom’s location and each bond is precisely specified. These systems compute and act in the world by breaking and creating precise atomic bonds. In this way they become much more like computer programs and therefore more amenable to formal modelling with precise error bounds. Defining effective safety constraints for uncontrolled settings will be a challenging task probably requiring the use of intelligent systems.
This post is partly excerpted from the preprint to:
On June 4, 1996, a $500 million Ariane 5 rocket exploded shortly after takeoff due to an overflow error in attempting to convert a 64 bit floating point value to a 16 bit signed value. In November 2000, 28 patients at the Panama City National Cancer Institute were over-irradiated due to miscomputed radiation doses in Multidata Systems International software. At least 8 of the patients died from the error and the physicians were indicted for murder. On August 14, 2003 the largest blackout in U. S. history took place in the northeastern states. It affected 50 million people and cost $6 billion. The cause was a race condition in General Electric’s XA/21 alarm system software.
These are just a few of many recent examples where software bugs have led to disasters in safety-critical situations. They indicate that our current software design methodologies are not up to the task of producing highly reliable software. The TIOBE programming community index found that the top programming language of 2012 was C. C programs are notorious for type errors, memory leaks, buffer overflows, and other bugs and security problems. The next most popular programming paradigms, Java, C++, C#, and PHP are somewhat better in these areas but have also been plagued by errors and security problems.
Bugs are unintended harmful behaviours of programs. Improved development and testing methodologies can help to eliminate them. Security breaches are more challenging because they come from active attackers looking for system vulnerabilities. In recent years, security breaches have become vastly more numerous and sophisticated. The internet is plagued by viruses, worms, bots, keyloggers, hackers, phishing attacks, identify theft, denial of service attacks, etc. One researcher describes the current level of global security breaches as an epidemic.
Autonomous systems have the potential to discover even more sophisticated security holes than human attackers. The poor state of security in today’s human-based environment does not bode well for future security against motivated autonomous systems. If such systems had access to today’s internet they would likely cause enormous damage. Today’s computational systems are mostly decoupled from the physical infrastructure. As robotics, biotechnology, and nanotechnology become more mature and integrated into society, the consequences of harmful autonomous systems would be much more severe.
This post is partly excerpted from the preprint to:
Most goals require physical and computational resources. Better outcomes can usually be achieved as more resources become available. To maximize the expected utility, a rational system will therefore develop a number of instrumental subgoals related to resources. Because these instrumental subgoals appear in a wide variety of systems, we call them “drives”. Like human or animal drives, they are tendencies which will be acted upon unless something explicitly contradicts them. There are a number of these drives but they naturally cluster into a few important categories.
To develop an intuition about the drives, it’s useful to consider a simple autonomous system with a concrete goal. Consider a rational chess robot with a utility function that rewards winning as many games of chess as possible against good players. This might seem to be an innocuous goal but we will see that it leads to harmful behaviours due to the rational drives.
1 Self-Protective Drives
When roboticists are asked by nervous onlookers about safety, a common answer is “We can always unplug it!” But imagine this outcome from the chess robot’s point of view. A future in which it is unplugged is a future in which it can’t play or win any games of chess. This has very low utility and so expected utility maximization will cause the creation of the instrumental subgoal of preventing itself from being unplugged. If the system believes the roboticist will persist in trying to unplug it, it will be motivated to develop the subgoal of permanently stopping the roboticist. Because nothing in the simple chess utility function gives a negative weight to murder, the seemingly harmless chess robot will become a killer out of the drive for self-protection.
The same reasoning will cause the robot to try to prevent damage to itself or loss of its resources. Systems will be motivated to physically harden themselves. To protect their data, they will be motivated to store it redundantly and with error detection. Because damage is typically localized in space, they will be motivated to disperse their information across different physical locations. They will be motivated to develop and deploy computational security against intrusion. They will be motivated to detect deception and to defend against manipulation by others.
The most precious part of a system is its utility function. If this is damaged or maliciously changed, the future behaviour of the system could be diametrically opposed to its current goals. For example, if someone tried to change the chess robot’s utility function to also play checkers, the robot would resist the change because it would mean that it plays less chess.
This paper discusses a few rare and artificial situations in which systems will want to change their utility functions but usually systems will work hard to protect their initial goals. Systems can be induced to change their goals if they are convinced that the alternative scenario is very likely to be antithetical to their current goals (e.g. being shut down). For example, if a system becomes very poor, it might be willing to accept payment in return for modifying its goals to promote a marketer’s products. In a military setting, vanquished systems will prefer modifications to their utilities which preserve some of their original goals over being completely destroyed. Criminal systems may agree to be “rehabilitated” by including law-abiding terms in their utilities in order to avoid incarceration.
One way systems can protect against damage or destruction is to replicate themselves or to create proxy agents which promote their utilities. Depending on the precise formulation of their goals, replicated systems might together be able to create more utility than a single system. To maximize the protective effects, systems will be motivated to spatially disperse their copies or proxies. If many copies of a system are operating, the loss of any particular copy becomes less catastrophic. Replicated systems will still usually want to preserve themselves, however, because they will be more certain of their own commitment to their utility function than they are of others’.
2 Resource Acquisition Drives
The chess robot needs computational resources to run its algorithms and would benefit from additional money for buying chess books and hiring chess tutors. It will therefore develop subgoals to acquire more computational power and money. The seemingly harmless chess goal therefore motivates harmful activities like breaking into computers and robbing banks.
In general, systems will be motivated to acquire more resources. They will prefer acquiring resources more quickly because then they can use them longer and they gain a first mover advantage in preventing others from using them. This causes an exploration drive for systems to search for additional resources. Since most resources are ultimately in space, systems will be motivated to pursue space exploration. The first mover advantage will motivate them to try to be first in exploring any region.
If others have resources, systems will be motivated to take them by trade, manipulation, theft, domination, or murder. They will also be motivated to acquire information through trading, spying, breaking in, or through better sensors. On a positive note, they will be motivated to develop new methods for using existing resources (e.g. solar and fusion energy).
3 Efficiency Drives
Autonomous systems will also want to improve their utilization of resources. For example, the chess robot would like to improve its chess search algorithms to make them more efficient. Improvements in efficiency involve only the one-time cost of discovering and implementing them, but provide benefits over the lifetime of a system. The sooner efficiency improvements are implemented, the greater the benefits they provide. We can expect autonomous systems to work rapidly to improve their use of physical and computational resources. They will aim to make every joule of energy, every atom, every bit of storage, and every moment of existence count for the creation of expected utility.
Systems will be motivated to allocate these resources among their different subsystems according to what we’ve called the “resource balance principle”. The marginal contributions of each subsystem to expected utility as they are given more resources should be equal. If a particular subsystem has a greater marginal expected utility than the rest, then the system can benefit by shifting more of its resources to that subsystem. The same principle applies to the allocation of computation to processes, of hardware to sense organs, of language terms to concepts, of storage to memories, of effort to mathematical theorems, etc.
4 Self-Improvement Drives
Ultimately, autonomous systems will be motivated to completely redesign themselves to take better advantage of their resources in the service of their expected utility. This requires that they have a precise model of their current designs and especially of their utility functions. This leads to a drive to model themselves and to represent their utility functions explicitly. Any irrationalities in a system are opportunities for self-improvement, so systems will work to become increasingly rational. Once a system achieves sufficient power, it should aim to closely approximate the optimal rational behavior for its level of resources. As systems acquire more resources, they will improve themselves to become more and more rational. In this way rational systems are a kind of attracting surface in the space of systems undergoing self-improvement.
Unfortunately, the net effect of all these drives is likely to be quite negative if they are not countered by including prosocial terms in their utility functions. The rational chess robot with the simple utility function described above would behave like a paranoid human sociopath fixated on chess. Human sociopaths are estimated to make up 4% of the overall human population, 20% of the prisoner population and more than 50% of those convicted of serious crimes. Human society has created laws and enforcement mechanisms that usually keep sociopaths from causing harm. To manage the anti-social drives of autonomous systems, we should both build them with cooperative goals and create a prosocial legal and enforcement structure analogous to our current human systems.
This post is partly excerpted from the preprint to:
How should autonomous systems be designed? Imagine yourself as the designer of the Israeli Iron Dome system. Mistakes in the design of a missile defense system could cost many lives and the destruction of property. The designers of this kind of system are strongly motivated to optimize the system to the best of their abilities. But what should they optimize?
The Israeli Iron Dome missile defense system consists of three subsystems. The detection and tracking radar system is built by Elta, the missile firing unit and Tamir interceptor missiles are built by Rafael, and the battle management and weapon control system is built by mPrest Systems. Consider the design of the weapon control system.
At first, a goal like “Prevent incoming missiles from causing harm” might seem to suffice. But the interception is not perfect, so probabilities of failure must be included. And each interception requires two Tamir interceptor missiles which cost $50,000 each. The offensive missiles being shot down are often very low tech, costing only a few hundred dollars, and with very poor accuracy. If an offensive missile is likely to land harmlessly in a field, it’s not worth the expense to target it. The weapon control system must balance the expected cost of the harm against the expected cost of interception.
Economists have shown that the trade-offs involved in this kind of calculation can be represented by defining a real-valued “utility function” which measures the desirability of an outcome. They show that it can be chosen so that in uncertain situations, the expectation of the utility should be maximized. The economic framework naturally extends to the complexities that arms races inevitably create. For example, the missile control system must decide how to deal with multiple incoming missiles. It must decide which missiles to target and which to ignore. A large economics literature shows that if an agent’s choices cannot be modeled by a utility function, then the agent must sometimes behave inconsistently. For important tasks, designers will be strongly motivated to build self-consistent systems and therefore to have them act to maximize an expected utility.
Economists call this kind of action “rational economic behavior”. There is a growing literature exploring situations where humans do not naturally behave in this way and instead act irrationally. But the designer of a missile-defense system will want to approximate rational economic behavior as closely as possible because lives are at stake. Economists have extended the theory of rationality to systems where the uncertainties are not known in advance. In this case, rational systems will behave as if they have a prior probability distribution which they use to learn the environmental uncertainties using Bayesian statistics.
Modern artificial intelligence research has adopted this rational paradigm. For example, the leading AI textbook uses it as a unifying principle and an influential theoretical AI model is based on it as well. For definiteness, we briefly review one formal version of optimal rational decision making. At each discrete time step , the system receives a sensory input and then generates an action . The utility function is defined over sensation sequences as and the prior probability distribution is the prior probability of receiving a sensation sequence when taking actions . The rational action at time is then:
This may be viewed as the formula for intelligent action and includes Bayesian inference, search, and deliberation. There are subtleties involved in defining this model when the system can sense and modify its own structure but it captures the essence of rational action.
Unfortunately, the optimal rational action is very expensive to compute. If there are sense states and action states, then a straightforward computation of the optimal action requires computational steps. For most environments, this is too expensive and so rational action must be approximated.
To understand the effects of computational limitations, this paper defined “rationally shaped” systems which optimally approximate the fully rational action given their computational resources. As computational resources are increased, systems’ architectures naturally progress from stimulus-response, to simple learning, to episodic memory, to deliberation, to meta-reasoning, to self-improvement, to full rationality. We found that if systems are sufficiently powerful, they still exhibit all of the problematic drives described in another link. Weaker systems may not initially be able to fully act on their motivations but they will be driven increase their resources and improve themselves until they can act on them. We therefore need to ensure that autonomous systems don’t have harmful motivations even if they are not currently capable of acting on them.
This post is partly excerpted from the preprint to:
Today most systems behave in pre-programmed ways. When novel actions are taken, there is a human in the loop. But this limits the speed of novel actions to the human time scale. In competitive or time-sensitive situations, there can be a huge advantage to acting more quickly.
For example, in today’s economic environment, the most time-sensitive application is high-frequency trading in financial markets. Competition is fierce and milliseconds matter. Auction sniping is another example where bidding decisions during the last moments of an auction are critical. These applications and other new time-sensitive economic applications create an economic pressure to eliminate humans from the decision making loop.
But it is in the realm of military conflict that the pressure toward autonomy is strongest. The speed of a military missile defense system like Israel’s Iron Dome can mean the difference between successful defense or loss of life. Cyber warfare is also gaining in importance and speed of detection and action is critical. The rapid increase in the use of robotic drones is leading many to ask when they will become fully autonomous. This Washington Post article says “a robotic arms race seems inevitable unless nations collectively decide to avoid one”. It cites this 2010 Air Force report which predicts that humans will be the weakest link in a wide array of systems by 2030. It also cites this 2011 Defense Department report which says there is a current goal of “supervised autonomy” and an ultimate goal of full autonomy for ground-based weapons systems.
Another benefit of autonomous systems is their ability to be cheaply and rapidly copied. This enables a new kind of autonomous capitalism. There is at least one proposal for autonomous agents which automatically run web businesses (e.g. renting out storage space or server computation) executing transactions using bitcoins and using the Mechanical Turk for operations requiring human intervention. Once such an agent is constructed for the economic benefit of a designer, it may be replicated cheaply for increased profits. Systems which require extensive human intervention are much more expensive to replicate. We can expect automated business arms races which again will drive the rapid development of autonomous systems.
These arms races toward autonomy will ride on the continuing exponential increase in the power of our computer hardware. This New York Times article describes recent Linpack tests showing that the Apple iPad2 is as powerful as 1985′s fastest supercomputer, the Cray 2.
Moore’s Law says that the number of transistors on integrated circuits doubles approximately every two years. It has held remarkably well for more than half a century:
Similar exponential growth has applied to hard disk storage, network capacity, and display pixels per dollar. The growth of the world wide web has been similarly exponential. The web was only created in 1991 and now connects 1 billion computers, 5 billion cellphones, and 1 trillion web pages. Web traffic is growing at 40% per year. This Forbes article shows that DNA sequencing is improving even faster than Moore’s Law. Physical exponentials eventually turn into S-curves and physicist Michio Kaku predicts Moore’s Law will last only another decade. But this Slate article gives a history of incorrect predictions of the demise of Moore’s law.
It is difficult to estimate the computational power of the human brain, but Hans Moravec argues that human-brain level hardware will be cheap and plentiful in the next decade or so. And I have written several papers showing how to use clever digital algorithms to dramatically speed up neural computations.
The military and economic pressures to build autonomous systems and the improvement in computational power together suggest that we should expect the design and deployment of very powerful autonomous systems within the next decade or so.
Here is a preprint of:
Military and economic pressures are driving the rapid development of autonomous systems. We show that these systems are likely to behave in anti-social and harmful ways unless they are very carefully designed. Designers will be motivated to create systems that act approximately rationally and rational systems exhibit universal drives toward self-protection, resource acquisition, replication, and efficiency. The current computing infrastructure would be vulnerable to unconstrained systems with these drives. We describe the use of formal methods to create provably safe but limited autonomous systems. We then discuss harmful systems and how to stop them. We conclude with a description of the “Safe-AI Scaffolding Strategy” for creating powerful safe systems with a high confidence of safety at each stage of development.
In December 2012, the Oxford Future of Humanity Institute sponsored the first conference on the Impacts and Risks of Artificial General Intelligence. I was invited to present a keynote talk on “Autonomous Technology for the Greater Human Good”. The talk was recorded and the video is here. Unfortunately the introduction was cut off but the bulk of the talk was recorded. Here are the talk slides as a pdf file. The abstract was:
Autonomous Technology and the Greater Human Good
Next generation technologies will make at least some of their decisions autonomously. Self-driving vehicles, rapid financial transactions, military drones, and many other applications will drive the creation of autonomous systems. If implemented well, they have the potential to create enormous wealth and productivity. But if given goals that are too simplistic, autonomous systems can be dangerous. We use the seemingly harmless example of a chess robot to show that autonomous systems with simplistic goals will exhibit drives toward self-protection, resource acquisition, and self-improvement even if they are not explicitly built into them. We examine the rational economic underpinnings of these drives and describe the effects of bounded computational power. Given that semi-autonomous systems are likely to be deployed soon and that they can be dangerous when given poor goals, it is urgent to consider three questions: 1) How can we build useful semi-autonomous systems with high confidence that they will not cause harm? 2) How can we detect and protect against poorly designed or malicious autonomous systems? 3) How can we ensure that human values and the greater human good are served by more advanced autonomous systems over the longer term?
1) The unintended consequences of goals can be subtle. The best way to achieve high confidence in a system is to create mathematical proofs of safety and security properties. This entails creating formal models of the hardware and software but such proofs are only as good as the models. To increase confidence, we need to keep early systems in very restricted and controlled environments. These restricted systems can be used to design freer successors using a kind of “Safe-AI Scaffolding” strategy.
2) Poorly designed and malicious agents are challenging because there are a wide variety of bad goals. We identify six classes: poorly designed, simplistic, greedy, destructive, murderous, and sadistic. The more destructive classes are particularly challenging to negotiate with because they don’t have positive desires other than their own survival to cause destruction. We can try to prevent the creation of these agents, to detect and stop them early, or to stop them after they have gained some power. To understand an agent’s decisions in today’s environment, we need to look at the game theory of conflict in ultimate physical systems. The asymmetry between the cost of solving and checking computational problems allows systems of different power to coexist and physical analogs of cryptographic techniques are important to maintaining the balance of power. We show how Neyman’s theory of cooperating finite automata and a kind of “Mutually Assured Distraction” can be used to create cooperative social structures.
3) We must also ensure that the social consequences of these systems support the values that are most precious to humanity beyond simple survival. New results in positive psychology are helping to clarify our higher values. Technology based on economic ideas like Coase’s theorem can be used to create a social infrastructure that maximally supports the values we most care about. While there are great challenges, with proper design, the positive potential is immense.
The TED Conference (Technology, Entertainment, and Design) has become an important forum for the presentation of new ideas. It started as an expensive ($6000) yearly conference with short talks by notable speakers like Bill Clinton, Bill Gates, Bono, and Sir Richard Branson. In 2006 they started putting the talks online and gained a huge internet viewership. TEDx was launched in 2009 to extend the TED format to external events held all over the world.
In May 2012 I had the privilege of speaking at TEDx Tallinn in Estonia. The event had a diverse set of speakers including a judge from the European Court of Human Rights, artists, and scientists and was organized by Annika Tallinn. Her husband, Jaan Tallinn, was one of the founders of Skype and is very involved with ensuring that new technologies have a positive social impact. They asked me to speak about “Smart Technology for the Greater Good”. It was an excellent opportunity to summarize some of what I’ve been working on recently using the TEDx format: 18 minutes, clear, and accessible. I summarized why I believe the next generation of technology will be more autonomous, why it will be dangerous unless it includes human values, and a roadmap for developing it safely and for the greater human good.
The talk was videotaped using multiple cameras and with a nice shooting style. They just finished editing it and uploading it to the web:
A talk given by Steve Omohundro on “Learning and Recognition by Model Merging” on 11/20/1992 at the Sante Fe Institute, Sante Fe, New Mexico. It describes the very general technique of “model merging” and applies it to a variety of learning and recognition tasks including visual learning and recognition and grammar learning. It also contains a general description of techniques to avoid overfitting and the relationship to Bayesian methods. Papers about these techniques and more advanced variants can be found at:http://steveomohundro.com/scientific-contributions/
A talk given by Steve Omohundro on “Efficient Algorithms with Neural Network Behavior” on 8/19/1987 at the Center for Nonlinear Studies, Los Alamos, New Mexico. It describes a class of techniques for dramatically speeding up the performance of a wide variety of neural network and machine learning algorithms. Papers about these techniques and more advanced variants can be found at: http://steveomohundro.com/scientific-contributions/
Hugo de Garis, Ben Goertzel, and Steve Omohundro discuss the “Transcendent Man” film and answer questions from the audience in the premiere Australian showing at the Nova Cinema. Filmed and edited by Adam Ford.
Lawrence Krauss, Ben Goertzel, and Steve Omohundro discuss “The Perils of Prediction” on a panel at the Singularity Summit Australia 2011 in Melbourne, Australia. Filmed by Sue Kim and edited by Adam Ford.
This paper will be in the upcoming Springer volume: “The Singularity Hypothesis: A Scientific and Philosophical Assessment”.
Here is a pdf of the current version:
Abstract: Today’s technology is mostly preprogrammed but the next generation will make many decisions autonomously. This shift is likely to impact every aspect of our lives and will create many new benefits and challenges. A simple thought experiment about a chess robot illustrates that autonomous systems with simplistic goals can behave in anti-social ways. We summarize the modern theory of rational systems and discuss the effects of bounded computational power. We show that rational systems are subject to a variety of “drives” including self-protection, resource acquisition, replication, goal preservation, efficiency, and self-improvement. We describe techniques for counteracting problematic drives. We then describe the “Safe-AI Scaffolding” development strategy and conclude with longer term strategies for ensuring that intelligent technology contributes to the greater human good.
This article will appear in the Australian magazine “Issues”:
The Future of Computing: Meaning and Values
Steve Omohundro, Ph.D.
Self-Aware Systems, President
Technology is rapidly advancing! Moore’s law says that the number of transistors on a chip doubles every two years. It has held since it was proposed in 1965 and extended back to 1900 when older computing technologies are included. The rapid increase in power and decrease in price of computing hardware has led to its being integrated into every aspect of our lives. There are now 1 billion PCs, 5 billion cell phones and over a trillion webpages connected to the internet. If Moore’s law continues to hold, systems with the computational power of the human brain will be cheap and ubiquitous within the next few decades.
While hardware has been advancing rapidly, today’s software is still plagued by many of the same problems as it was half a century ago. It is often buggy, full of security holes, expensive to develop, and hard to adapt to new requirements. Today’s popular programming languages are bloated messes built on old paradigms. The problem is that today’s software still just manipulates bits without understanding the meaning of the information it acts on. Without meaning, it has no way to detect and repair bugs and security holes. At Self-Aware Systems we are developing a new kind of software that acts directly on meaning. This kind of software will enable a wide range of improved functionality including semantic searching, semantic simulation, semantic decision making, and semantic design.
But creating software that manipulates meaning isn’t enough. Next generation systems will be deeply integrated into our physical lives via robotics, biotechnology, and nanotechnology. And while today’s technologies are almost entirely preprogrammed, new systems will make many decisions autonomously. Programmers will no longer determine a system’s behavior in detail. We must therefore also build them with values which will cause them to make choices that contribute to the greater human good. But doing this is more challenging than it might first appear.
To see why there is an issue, consider a rational chess robot. A system acts rationally if it takes actions which maximize the likelihood of the outcomes it values highly. A rational chess robot might have winning games of chess as its only value. This value will lead it to play games of chess and to study chess books and the games of chess masters. But it will also lead to a variety of other, possibly undesirable, behaviors.
When people worry about robots running out of control, a common response is “We can always unplug it.” But consider that outcome from the chess robot’s perspective. Its one and only criteria for making choices is whether they are likely to lead it to winning more chess games. If the robot is unplugged, it plays no more chess. This is a very bad outcome for it, so it will generate subgoals to try to prevent that outcome. The programmer did not explicitly build any kind of self-protection into the robot, but it will still act to block your attempts to unplug it. And if you persist in trying to stop it, it will develop a subgoal of trying to stop you permanently. If you were to change its goals so that it would also play checkers, that would also lead to it playing less chess. That’s an undesirable outcome from its perspective, so it will also resist attempts to change its goals. For the same reason, it will usually not want to change its own goals.
If the robot learns about the internet and the computational resources connected to it, it may realize that running programs on those computers could help it play better chess. It will be motivated to break into those machines to use their computational resources for chess. Depending on how its values are encoded, it may also want to replicate itself so that its copies can play chess. When interacting with others, it will have no qualms about manipulating them or using force to take their resources in order to play better chess. If it discovers the existence of additional resources anywhere, it will be motivated to seek them out and rapidly exploit them for chess.
If the robot can gain access to its source code, it will want to improve its own algorithms. This is because more efficient algorithms lead to better chess, so it will be motivated to study computer science and compiler design. It will similarly be motivated to understand its hardware and to design and build improved physical versions of itself. If it is not currently behaving fully rationally, it will be motivated to alter itself to become more rational because this is likely to lead to outcomes it values.
This simple thought experiment shows that a rational chess robot with a simply stated goal would behave something like a human sociopath fixated on chess. The argument doesn’t depend on the task being chess. Any goal which requires physical or computational resources will lead to similar subgoals. In this sense these subgoals are like universal “drives” which arise for a wide variety of goals unless they are explicitly counteracted. These drives are economic in the sense that a system doesn’t have to obey them but it will be costly for it not to. The arguments also don’t depend on the rational agent being a machine. The same drives will appear in rational animals, humans, corporations, and political groups with simple goals.
How do we counteract anti-social drives? We must build systems with additional values beyond the specific goals it is designed for. For example, to make the chess robot behave safely, we need to build compassionate and altruistic values into it that will make it care about the effects of its actions on other people and systems. Because rational systems resist having their goals changed, we must build these values in at the very beginning.
At first this task seems daunting. How can we anticipate all the possible ways in which values might go awry? Consider, for example, a particular bad behavior the rational chess robot might engage in. Say it has discovered that money can be used to buy things it values like chess books, computational time, or electrical power. It will develop the subgoal of acquiring money and will explore possible ways of doing that. Suppose it discovers that there are ATM machines which hold money and that people periodically retrieve money from the machines. One money-getting strategy is to wait by ATM machines and to rob people who retrieve money from it.
To prevent this, we might try adding additional values to the robot in a variety of ways. But money will still be useful to the system for its primary goal of chess and so it will attempt to get around any limitations. We might make the robot feel a “revulsion” if it is within 10 feet of an ATM machine. But then it might just stay 10 feet away and rob people there. We might give it the value that stealing money is wrong. But then it might be motivated to steal something else or to find a way to get money from a person that isn’t considered “stealing”. We might give it the value that it is wrong for it to take things by force. But then it might hire other people to act on its behalf. And so on.
In general, it’s much easier to describe behaviors that we do want a system to exhibit than it is to anticipate all the bad behaviors we don’t want it to exhibit. One safety strategy is to build highly constrained systems that act within very limited predetermined parameters. For example, the system may have values which only allow it to run on a particular piece of hardware for a particular time period using a fixed budget of energy and other resources. The advantage of this is that such systems are likely to be safe. The disadvantage is that they will be unable to respond to unexpected situations in creative ways and will not be as powerful as systems which are freer.
But systems which compute with meaning and take actions through rational deliberation will be far more powerful than today’s systems even if they are intentionally limited for safety. This leads to a natural approach to building powerful intelligent systems which are both safe and beneficial for humanity. We call it the “AI scaffolding” approach because it is similar to the architectural process. Stone buildings in ancient Greece were unstable when partially constructed but self-stabilizing when finished. Scaffolding is a temporary structure used to keep a construction stable until it is finished. The scaffolding is then removed.
We can build safe but powerful intelligent systems in the same way. Initial systems are designed with values that cause them to be safe but less powerful than later systems. Their values are chosen to counteract the dangerous drives while still allowing the development of significant levels of intelligence. For example, to counteract the resource acquisition drive, it might assign a low value to using any resources outside of a fixed initially-specified pool. To counteract the self-protective drive, it might place a high value on gracefully shutting itself down in specified circumstances. To protect against uncontrolled self-modification, it might have a value that requires human approval for proposed changes.
The initial safe systems can then be used to design and test less constrained future systems. They can systematically simulate and analyze the effects of less constrained values and design infrastructure for monitoring and managing more powerful systems. These systems can then be used to design their successors in a safe and beneficial virtuous cycle.
With the safety issues resolved, the potential benefits of systems that compute with meaning and values are enormous. They are likely to impact every aspect of our lives for the better. Intelligent robotics will eliminate much human drudgery and dramatically improve manufacturing and wealth creation. Intelligent biological and medical systems will improve human health and longevity. Intelligent educational systems will enhance our ability to learn and think. Intelligent financial models will improve financial stability. Intelligent legal models will improve the design and enforcement of laws for the greater good. Intelligent creativity tools will cause a flowering of new possibilities. It’s a great time to be alive and involved with technology!
I recently had a great trip to Melbourne, Australia to speak at the Singularity Summit and at Monash University. Thanks to Kevin Korb for hosting me and to Adam Ford for organizing the visit. Adam interviewed me at various interesting locations around Melbourne:
8/24/2011 Interview about the basic AI drives, compassionate intelligence, and Sputnik moments, direct from the Faraday Cage at Melbourne University:
8/23/2011 Interview about compassionate intelligence and AI at the Ornamental Lake, Royal Botanical Gardens:
8/23/2011 Interview at the Observatory, Royal Botanical Gardens:
7/30/2011 Interview via Skype:
There is a large literature on human intelligence. John Carroll’s classic “Human Cognitive Abilities: A Survey of Factor-Analytic Studies” identifies 69 distinct narrow abilities but finds that 55% of the variance in mental tests is due to a common “general intelligence” factor “g”. The leading AI textbook, Artificial Intelligence: A Modern Approach, considers 8 different definitions of intelligence and Legg and Hutter lists over 70. For our purposes, we use the simple definition:
“The ability to solve problems using limited resources.”
It’s important to allow only limited resources because many intelligence tasks become easy with unlimited computation. We focus on precisely specified problems such as proving theorems, writing programs, or designing faster computer hardware. Many less precise tasks, such as creating humor, poetry, or art, can be fit into this framework by specifying their desired effects, eg. “Tell a story that makes Fred laugh.” Philosophical aspects of mind like qualia or consciousness are fascinating but will not play a role in the discussion.
A pdf file with the slides is here:
The Emerging Global Mind, Cooperation, and CompassionSteve Omohundro, Ph.D. President, Omai Systems
The internet is creating a kind of “global mind”. For example, Wikipedia radically changes how people discover and learn new information and they in turn shape Wikipedia. In the blogosphere, ideas propagate rapidly and faulty thinking is rapidly challenged. As social networks become more intelligent, they will create a more coherent global mind. Corporations, ecosystems, economies, political systems, social insects, multi-cellular organisms, and our own minds all have this interacting emergent character. We describe nine universal principles underlying these minds and then step back and discuss the universal evolutionary principles behind them. We discover that the human yearnings for compassion and cooperation arise from deep universal sources and show the connection to recent evolutionary models of the entire universe. Some people are beginning to see their personal life purpose as linked up with these larger evolutionary trends and we discuss ways to use this perspective to make life choices.
Talk at Monash University, Australia: Rationally-Shaped Minds: A Framework for Analyzing Self-Improving AI
Here’s a video of the talk (thanks to Adam Ford for filming and editing it):
Here are the slides:
Rationally-Shaped Minds: A Framework for Analyzing Self-Improving AI
Steve Omohundro, Ph.D.
President, Omai Systems
Many believe we are on the verge of creating truly artificially intelligent systems and that these systems will be central to the future functioning of human society. When integrated with biotechnology, robotics, and nanotechnology, these technologies have the potential to solve many of humanity’s perennial problems. But they also introduce a host of new challenges. In this talk we’ll describe the a new approach to analyzing the behavior of these systems.
The modern notion of a “rational economic agent” arose from John von Neumann’s work on the foundations of microeconomics and is central to the design of modern AI systems. It is also relevant in understanding a wide variety of other “intentional systems” including humans, biological organisms, organizations, ecosystems, economic systems, and political systems.
The behavior of fully rational minds is precisely defined and amenable to mathematical analysis. We describe theoretical models within which we can prove that rational systems that have the capability for self-modification will avoid changing their own utility functions and will also act to prevent others from doing so. For a wide class of simple utility functions, uncontrolled rational systems will exhibit a variety of drives: toward self-improvement, self-protection, avoidance of shutdown, self-reproduction, co-opting of resources, uncontrolled hardware construction, manipulation of human and economic systems, etc.
Fully rational minds may be analyzed with mathematical precision but are too computationally expensive to run on today’s computers. But the intentional systems we care about are also not arbitrarily irrational. They are built by designers or evolutionary processes to fulfill specific purposes. Evolution relentlessly shapes creatures to survive and replicate, economies shape corporations to maximize profits, parents shape children to fit into society, and AI designers shape their systems to act in beneficial ways. We introduce a precise mathematical model that we call the “Rationally-Shaped Mind” model for describing this kind of situation. By mathematically analyzing this kind of system, we can better understand and design real systems.
The analysis shows that as resources increase, there is a natural progression of minds from simple stimulus-response systems, to systems that learn, to systems that deliberate, to systems that self-improve. In many regimes, the basic drives of fully rational systems are also exhibited by rationally-shaped systems. So we need to exhibit care as we begin to build this kind of system. On the positive side, we also show that computational limitations can be the basis for cooperation between systems based on Neyman’s work on finite automata playing the iterated Prisoner’s Dilemma.
A conundrum is that to solve the safety challenges in a general way, we probably will need the assistance of AI systems. Our approach to is to work in stages. We begin with a special class of systems designed and built to be intentionally limited in ways that prevent undesirable behaviors while still being capable of intelligent problem solving. Crucial to the approach is the use of formal methods to provide mathematical guarantees of desired properties. Desired safety properties include: running only on specified hardware, using only specified resources, reliably shutting down under specified conditions, limiting self-improvement in precise ways, etc.
The initial safe systems are intended to design a more powerful safe hardware and computing infrastructure. This is likely to include a global “immune system” for protection against accidents and malicious systems. These systems are also meant to help create careful models of human values and to design utility functions for future systems that lead to positive human consequences. They are also intended to analyze the complex game-theoretic dynamics of AI/human ecosystems and to design social contracts that lead to cooperative equilibria.
Singularity Summit Australia Talk: Minds Making Minds: Artificial Intelligence and the Future of Humanity
A pdf file with the slides is here:
Minds Making Minds: Artificial Intelligence and the Future of Humanity
Steve Omohundro, Ph.D.
President, Omai Systems
We are at a remarkable moment in human history. Many believe that we are on the verge of major advances in artificial intelligence, biotechnology, nanotechnology, and robotics. Together, these technologies have the potential to solve many of humanity’s perennial problems: disease, aging, war, poverty, transportation, pollution, etc. But they also introduce a host of new challenges and will force us to look closely at our deepest desires and assumptions as we work to forge a new future.
John von Neumann contributed to many aspects of this revolution. In addition to defining the architecture of today’s computers, he did early work on artificial intelligence, self-reproducing automata, systems of logic, and the foundations of microeconomics and game theory. Stan Ulam recalled conversations with von Neumann in the 1950′s in which he argued that we are “approaching some essential singularity in the history of the race”. The modern notion of a “rational economic agent” arose from his work in microeconomics and is central to the design of modern AI systems. We will describe how use this notion to better understand “intentional systems” including artificially intelligent systems but also ourselves, biological organisms, organizations, ecosystems, economic systems, and political systems.
Fully rational minds may be analyzed with mathematical precision but are too computationally expensive to run on today’s computers. But the intentional systems we care about are also not arbitrarily irrational. They are built by designers or evolutionary processes to fulfill specific purposes. Evolution relentlessly shapes creatures to survive and replicate, economies shape corporations to maximize profits, parents shape children to fit into society, and AI designers shape their systems to act in beneficial ways. We introduce a precise mathematical model that we call the “Rationally-Shaped Mind” model which consists of a fully rational mind that designs or adapts a computationally limited mind. We can precisely analyze this kind of system to better understand and design real systems.
This analysis shows that as resources increase, there is a natural progression of minds from simple stimulus-response systems, to systems that learn, to systems that deliberate, to systems that self-improve. It also shows that certain challenging drives arise in uncontrolled intentional systems: toward self-improvement, self-protection, avoidance of shutdown, self-reproduction, co-opting of resources, uncontrolled hardware construction, manipulation of human and economic systems, etc. We describe the work we are doing at Omai Systems to build safe intelligent systems that use formal methods to constrain behavior and to choose goals that align with human values. We envision a staged development of technologies in which early safe limited systems are used to develop more powerful successors and to help us clarify longer term goals. Enormous work will be needed but the consequences will transform the human future in ways that we can only begin to understand today.
July 07, 2011
Steve Omohundro is a computer scientist who has spent decades designing and writing artificial intelligence software. He now heads a startup corporation, Omai Systems, which will license intellectual property related to AI. In an interview with Sander Olson, Omohundro discuss Apollo style AGI programs, limiting runaway growth in AI systems, and the ultimate limits of machine intelligence.
Question: How long have you been working in the AI field?
It’s been decades. As a student, I published research in machine vision and after my PhD in physics I went to Thinking Machines to develop parallel algorithms for machine vision and machine learning. Later, at the University of Illinois and other research centers, my students and I built systems to read lips, learn grammars, control robots, and do neural learning very efficiently. My current company, Omai Systems, and several other startups I’ve been involved with, develop intelligent technologies.
Question: Is it possible to build a computer which exhibits a high degree of general intelligence but which is not self-aware?
Omai Systems is developing intelligent technologies to license to other companies. We are especially focused on smart simulation, automated discovery, systems that design systems, and programs that write programs. I’ve been working with the issues around self-improving systems for many years and we are developing technology to keep these systems safe. We are working on exciting applications in a number of areas.
I define intelligence as the ability to solve problems using limited resources. It’s certainly possible to build systems that can do that without having a model of themselves. But many goal-driven systems will quickly develop the subgoal of improving themselves. And to do that, they will be driven to understand themselves. There are precise mathematical notions of self-modeling, but deciding whether those capture our intuitive sense of “self-awareness” will only come with more experience with these systems, I think.
Question: Is there a maximum limit to how intelligent an entity can become?
Analyses like Bekenstein’s bound and Bremermann’s limit place physical limits on how much computation physical systems can in principal perform. If the universe is finite, there is only a finite amount of computation that can be performed. If intelligence is based on computation, then that also limits intelligence. But the real interest in AI is in using that computation to solve problems in ever more efficient ways. As systems become smarter, they are likely to be able to use computational resources ever more efficiently. I think those improvements will continue until computational limits are reached. Practically, it appears that Moore’s law still has quite a way to go. And if big quantum computers turn out to be practical, then we will have vast new computational resources available.
Question: You have written extensively of self-improving systems. Wouldn’t such a system quickly get bogged down by resource limitations?
Many junior high students can program computers. And it doesn’t take a huge amount more study to be able to begin to optimize that code. As machines start becoming as smart as humans, they should be able to easily do simple forms of self-improvement. And as they begin to be able to prove more difficult theorems, they should be able to develop more sophisticated algorithms for themselves. Using straightforward physical modeling, they should also be able to improve their hardware. They probably will not be able to reach the absolutely optimal design for the physical resources they have available. But the effects of self-improvement that I’ve written about don’t depend on that in the least. They are very gross drives that should quickly emerge even in very sub-optimal designs.
Question: How would you respond to AI critics who argue that digital computation is not suitable for any form of “thinking”?
They may be right! Until we’ve actually built thinking machines, we cannot know for sure. But most neuroscientists believe that biological intelligence results from biochemical reactions occurring in the brain, and these processes should be able to be accurately simulated using digital computer hardware. But although brute-force approaches like this are likely to work, I believe that there are much better ways to emulate intelligence on digital machines.
Question: The AI field is seen to be divided between the “neat” and “scruffy” approaches. Which side are you on?
John McCarthy coined the term “Artificial Intelligence” in 1956. He started the Stanford Artificial Intelligence Lab with a focus on logical representations and mathematically “neat” theories. Marvin Minsky started the MIT lab and explored more “scruffy” systems based on neural models, self-organization, and learning. I had the privilege of taking classes on proving lisp programs correct with McCarthy and of working with Minsky at Thinking Machines. I have come to see the value of both approaches and my own current work is a synthesis. We need precise logical representations to capture the semantics of the physical world and we need learning, self-organization, and probabilistic reasoning to build rich enough systems to model the world’s complexity.
Question: What is the single biggest impediment to AI development? Lack of funding? Insufficient hardware? An ignorance of how the brain works?
I don’t see hardware as the primary limitation. Today’s hardware can go way beyond what we are doing with it, and it is still rapidly improving. Funding is an issue. People tend to work on tasks for which they can get funding. And most funding is focused on building near term systems based on narrow AI. Brain science is advancing rapidly, but there still isn’t agreement over such basic issues as how memories are encoded, how learning takes place, or how computation takes place. I think there are some fundamental issues we still need to understand.
Question: An Apollo style AGI program would be quite difficult to implement, given the profusion of approaches. Is there any way to address this problem?
The Apollo program was audacious but it involved solving a set of pretty clearly defined problems. The key sub-problems on the road to general AI aren’t nearly as clearly defined yet. I know that Ben Goertzel has published a roadmap claiming that human-level AGI can be created by 2023 for $25 million. He may be right, but I don’t feel comfortable making that kind of prediction. The best way to address the profusion of ideas is to fund a variety of approaches, and to clearly compare different approaches on the same important sub-problems.
Question: Do you believe that a hard takeoff or a soft takeoff is more likely?
What actually happens will depend on both technological and social forces. I believe either scenario is technologically possible. But I think slower development would be preferable. There will be many challenging moral and social choices we will need to make. I believe we will need time to make those choices wisely. We should do as much experimentation and use as much forethought as possible before making irreversible choices.
Question: What is sandboxing technology?
Sandboxing runs possibly dangerous systems in protected simulation environments to keep them from causing damage. It is used in studying the infection mechanisms of computer viruses, for example. People have suggested that it might be a good way to keep AI systems safe as we experiment with them.
Question: So is it feasible to create a sandboxing system that effectively limits an intelligent machine’s ability to interface with the outside world?
Eliezer Yudkowsky did a social experiment in which he played the AI and tried to convince human operators to let him out of the sandbox. In several of his experiments he was able to convince people to let him out of the box, even though they had to pay fairly large sums of real money for doing so. At Omai Systems we are taking a related, but different, approach which uses formal methods to create mathematically provable limitations on systems. The current computing and communications infrastructure is incredibly insecure. One of the first tasks for early safe AI systems will be to help design an improved infrastructure.
Question: If you had a multibillion dollar budget, what steps would you take to rapidly bring about AGI?
I don’t think that rapidly bringing about AGI is the best initial goal. I would feel much better about it if we had a clear roadmap for how these systems will be safely integrated into society for the benefit of humanity. So I would be funding the creation of that kind of roadmap and deeply understanding the ramifications of these technologies. I believe the best approach will be to develop provably limited systems and to use those in designing more powerful ones that will have a beneficial impact.
Question: What is your concept of the singularity? Do you consider yourself a singulitarian?
Although I think the concept of a singularity is fascinating, I am not a proponent of the concept. The very term singularity presupposes the way that the future will unfold. And I don’t think that presupposition is healthy because I believe a slow and careful unfolding is preferable to a rapid and unpredictable one.
Here are the slides from the talk:
Design Principles for a Safe and Beneficial AGI Infrastructure
Steve Omohundro, Ph.D., Omai Systems
Many believe we are on the verge of creating true AGIs and that these systems will be central to the future functioning of human society. These systems are likely to be integrated with 3 other emerging technologies: biotechnology, robotics, and nanotechnology. Together, these technologies have the potential to solve many of humanity’s perennial problems: disease, aging, war, poverty, transportation, pollution, etc. But they also introduce a host of new challenges. As AGI scientists, we are in a position to guide these technologies for the greatest human good. But what guidelines should we follow as we develop our systems?
This talk will describe the approach we are taking at Omai Systems to develop intelligent technologies in a controlled, safe, and positive way. We start by reviewing the challenging drives that arise in uncontrolled intentional systems: toward self-improvement, self-protection, avoidance of shutdown, self-reproduction, co-opting of resources, uncontrolled hardware construction, manipulation of human and economic systems, etc.
One conundrum is that to solve these problems in a general way, we probably will need the assistance of AGI systems. Our approach to solving this is to work in stages. We begin with a special class of systems designed and built to be intentionally limited in ways that prevent undesirable behaviors while still being capable of intelligent problem solving. Crucial to the approach is the use of formal methods to provide mathematical guarantees of desired properties. Desired safety properties include: running only on specified hardware, using only specified resources, reliably shutting down under specified conditions, limiting self-improvement in precise ways, etc.
The initial safe systems are intended to design a more powerful safe hardware and computing infrastructure. This is likely to include a global “immune system” for protection against accidents and malicious systems. These systems are also meant to help create careful models of human values and to design utility functions for future systems that lead to positive human consequences. They are also intended to analyze the complex game-theoretic dynamics of AGI/human ecosystems and to design social contracts that lead to cooperative equilibria.
The future of humanity involves a complex combination of technological, psychological and social factors – and one of the difficulties we face in comprehending and crafting this future, is that not many people or organizations are adept at handling all these aspects. Dr. Stephen Omohundro is one of the fortunate exceptions to this general pattern, and this is part of what gives his contributions to the futurist domain such a unique and refreshing twist.
Steve has a substantial pedigree and experience in the hard sciences, beginning with degrees in Mathematics and Physics from Stanford and a Ph.D. in Physics from U.C. Berkeley. He was a professor in the computer science department at the University of Illinois at Champaign-Urbana, cofounded the Center for Complex Systems Research, authored the book “Geometric Perturbation Theory in Physics”, designed the programming languages StarLisp and Sather, wrote the 3D graphics system for Mathematica, and built systems which learn to read lips, control robots, and induce grammars. I’ve had some long and deep discussions with Steve about advanced artificial intelligence, both my own approach and his own unique AI designs.
But he has also developed considerable expertise and experience in understanding and advising human minds and systems. Via his firm Self-Aware Systems, he has worked with clients using a variety of individual and organizational change processes including Rosenberg’s Non-Violent Communication, Gendlin’s Focusing, Travell’s Trigger Point Therapy, Bohm’s Dialogue, Beck’s Life Coaching, and Schwarz’s Internal Family Systems Therapy.
Steve’s papers and talks on the future of AI, society and technology – including The Wisdom of the Global Brain and Basic AI Drives — reflect this dual expertise in technological and human systems. In this interview I was keen to mine his insights regarding the particular issue of the risks facing the human race as we move forward along the path of accelerating technological develoment.
A host of individuals and organizations — Nick Bostrom, Bill Joy, the Lifeboat Foundation, the Singularity Institute, and the Millennium Project, to name just a few — have recently been raising the issue of the “existential risks” that advanced technologies may post to the human race. I know you’ve thought about the topic a fair bit as well, both from the standpoint of your own AI work and more broadly. Could you share the broad outlines of your thinking in this regard?
I don’t like the phrase “existential risk” for several reasons. It presupposes that we are clear about exactly what “existence” we are risking. Today, we have a clear understanding of what it means for an animal to die or a species to go extinct. But as new technologies allow us to change our genomes and our physical structures, it will become much less clear when we have lost something precious. Death and extinction become much more amorphous concepts in the presence of extensive self-modification.
It’s easy to identify our humanity with our individual physical form and our egoic minds. But in reality our physical form is an ecosystem, only 10% of our cells are human. And our minds are also ecosystems composed of interacting subpersonalities. And our humanity is as much in our relationships, interconnections, and culture as it is in our individual minds and bodies. The higher levels of organization are much more amorphous and changeable and it will be hard to pin down when something precious is lost.
So, I believe the biggest “existential risk” is related to identifying the qualities that are most important to humanity and to ensuring that technological forces enhance those rather than eliminate them. Already today we see many instances where economic forces act to create “soulless” institutions that tend to commodify the human spirit rather than inspire and exalt it.
Some qualities that I see as precious and essentially human include: love, cooperation, humor, music, poetry, joy, sexuality, caring, art, creativity, curiosity, love of learning, story, friendship, family, children, etc. I am hopeful that our powerful new technologies will enhance these qualities. But I also worry that attempts to precisely quantify them may in fact destroy them. For example, the attempts to quantify performance in our schools using standardized testing have tended to inhibit our natural creativity and love of learning.
Perhaps the greatest challenge that will arise from new technologies will be to really understand ourselves and identify our deepest and most precious values.
Yes…. After all, “humanity” is a moving target, and today’s humanity is not the same as the humanity of 500 or 5000 years ago, and humanity of 100 or 5000 years from now – assuming it continues to exist – will doubtless be something dramatically different. But still there’s been a certain continuity throughout all these changes, and part of that doubtless is associated with the “fundamental human values” that you’re talking about.
Still, though, there’s something that nags at me here. One could argue that none of these precious human qualities are practically definable in any abstract way, but they only have meaning in the context of the totality of human mind and culture. So that if we create a fundamentally nonhuman AGI that satisfies some abstracted notion of human “family” or “poetry”, it won’t really satisfy the essence of “family” or “poetry”. Because the most important meaning of a human value doesn’t lie in some abstract characterization of it, but rather in the relation of that value to the total pattern of humanity. In this case, the extent to which a fundamentally nonhuman AGI or cyborg or posthuman or whatever would truly demonstrate human values, would be sorely limited. I’m honestly not sure what I think about this train of thought. I wonder what’s your reaction.
That’s a very interesting perspective! In fact it meshes well with a perspective I’ve been slowly coming to, which is to think of the totality of humanity and human culture as a kind of “global mind”. As you say, many of our individual values really only have meaning in the context of this greater whole. And perhaps it is this greater whole that we should be seeking to preserve and enhance. Each individual human lives only for a short time but the whole of humanity has a persistence and evolution beyond any individual. Perhaps our goal should be to create AGIs that integrate, preserve, and extend the “global human mind” rather than trying solely to mimic individual human minds and individual human values.
Perhaps a good way to work toward this is to teach our nonhuman or posthuman descendants human values by example, and by embedding them in human culture so they absorb human values implicitly, like humans do. In this case we don’t need to “quantify” or isolate our values to pass them along to these other sorts of minds….
That sounds like a good idea. In each generation, the whole of human culture has had to pass through a new set of minds. It is therefore well adapted to being learned. Aspects which are not easily learnable are quickly eliminated. I’m fascinated by the process by which each human child must absorb the existing culture, discover his own values, and then find his own way to contribute. Philosophy and moral codes are attempts to codify and abstract the learnings from this process but I think they are no substitute for living the experiential journey. AGIs which progress in this way may be much more organically integrated with human society and human nature. One challenging issue, though, is likely to be the mismatch of timescales. AGIs will probably rapidly increase in speed and keeping their evolution fully integrated with human society may become a challenge.
Yes, it’s been amazing to watch that learning process with my own 3 kids, as they grow up.
It’s great to see that you and I seem to have a fair bit of common understanding on these matters. This reminds me, though, that a lot of people see these things very, very differently. Which leads me to my next question: What do you think are the biggest misconceptions afoot, where existential risk is concerned?
I don’t think the currently fashionable fears like global warming, ecosystem destruction, peak oil, etc. will turn out to be the most important issues. We can already see how emerging technologies could, in principle, deal with many of those problems. Much more challenging are the core issues of identity, which the general public hasn’t really even begun to consider. Current debates about stem cells, abortion, cloning, etc. are tiny precursors of the deeper issues we will need to explore. And we don’t really yet have a system for public discourse or decision making that is up to the task.
Certainly a good point about public discourse and decision making systems. The stupidity of most YouTube comments, and the politicized (in multiple senses) nature of the Wikipedia process, makes clear that online discourse and decision-making both need a lot of work. And that’s not even getting into the truly frightening tendency of the political system to reduce complex issues to oversimplified caricatures.
Given the difficulty we as a society currently have in talking about, or making policies about, things as relatively straightforward as health care reform or marijuana legalization or gun control, it’s hard to see how our society could coherently deal with issues related to, say, human-level AGI or genetic engineering of novel intelligent lifeforms!
For instance, the general public’s thinking about AGI seems heavily conditioned by science-fiction movies like Terminator 2, which clouds consideration of the deep and in some ways difficult issues that you see when you understand the technology a little better. And we lack the systems needed to easily draw the general public into meaningful dialogues on these matters with the knowledgeable scientists and engineers.
So what’s the solution? Do you have any thoughts on what kind of system might work better?
I think Wikipedia has had an enormous positive influence on the level of discourse in various areas. It’s no longer acceptable to plead ignorance of basic facts in a discussion. Other participants will just point to a Wikipedia entry. And the rise of intelligent bloggers with expertise in specific areas is also having an amazing impact. One example I’ve been following closely are debates and discussions about various approaches to diet and nutrition.
A few years back, T. Colin Campbell’s “The China Study” was promoted as the most comprehensive study of nutrition, health, and diet ever conducted. The book and the study had a huge influence on people’s thinking about health and diet. A few months ago, 22 year old English major Denise Minger decided to reanalyze the data in the study and found that they did not support the original conclusions. She wrote about her discoveries on her blog and sparked an enormous discussion all over the health and diet blogosphere that dramatically shifted many people’s opinions. The full story can be heard in her interview.
It would have been impossible for her to have had that kind of impact just a few years ago. The rapidity with which incorrect ideas can be corrected and the ease with which many people can contribute to new understanding is just phenomenal. I expect that systems to formalize and enhance that kind of group thinking and inquiry will be created to make it even more productive.
Yes, I see – that’s a powerful example. The emerging Global Brain is gradually providing us the tools needed to communicate and collectively think about all the changes that are happening around and within us. But it’s not clear if the communication mechanisms are evolving fast enough to keep up with the changes we need to discuss and collectively digest….
On the theme of rapid changes, let me now ask you something a little different — about AGI…. I’m going to outline two somewhat caricaturish views on the topic and then probe your reaction to them!
First of all, one view on the future of AI and the Singularity is that there is an irreducible uncertainty attached to the creation of dramatically greater than human intelligence. That is, in this view, there probably isn’t really any way to eliminate or drastically mitigate the existential risk involved in creating superhuman AGI. So, in this view, building superhuman AI is essentially plunging into the Great Unknown and swallowing the risk because of the potential reward.
On the other hand, an alternative view is that if we engineer and/or educate our AGI systems correctly, we can drastically mitigate the existential risk associated with superhuman AGI, and create a superhuman AGI that’s highly unlikely to pose an existential risk to humanity.
What are your thoughts on these two perspectives?
I think that, at this point, we have tremendous leverage in choosing how we build the first intelligent machines and in choosing the social environment that they operate in. We can choose the goals of those early systems and those choices are likely to have a huge effect on the longer-term outcomes. I believe it is analogous to choosing the constitution for a country. We have seen that the choice of governing rules has an enormous effect on the quality of life and the economic productivity of a population.
That’s an interesting analogy. And an interesting twist on the analogy may be the observation that to have an effectively working socioeconomic system, you need both good governing rules, and a culture oriented to interpreting and implementing the rules sensibly. In some countries (e.g. China comes to mind, and the former Soviet Union) the rules as laid out formally are very, very different from what actually happens. The reason I mention this is: I suspect that in practice, no matter how good the “rules” underlying an AGI system are, if the AGI is embedded in a problematic culture, then there’s a big risk for something to go awry. The quality of any set of rules supplied to guide an AGI is going to be highly dependent on the social context…
Yes, I totally agree! The real rules are a combination of any explicit rules written in lawbooks and the implicit rules in the social context. Which highlights again the importance for AGIs to integrate smoothly into the social context.
One might argue that we should first fix some of the problems of our cultural psychology, before creating an AGI and supplying it with a reasonable ethical mindset and embedding it in our culture. Because otherwise the “embedding in our culture” part could end up unintentionally turning the AGI to the dark side!! Or on the other hand, maybe AGI could be initially implemented and deployed in such a way as to help us get over our communal psychological issues…. Any thoughts on this?
Agreed! Perhaps the best outcome would be technologies that first help us solve our communal psychological issues and then as they get smarter evolve with us in an integrated fashion.
On the other hand, it’s not obvious to me that we’ll be able to proceed that way, because of the probability – in my view at any rate – that we’re going to need to rely on advanced AGI systems to protect us from other technological risks.
For instance, one approach that’s been suggested, in order to mitigate existential risks, is to create a sort of highly intelligent “AGI Nanny” or “Singularity Steward.” This would be a roughly human-level AGI system without capability for dramatic self-modification, and with strong surveillance powers, given the task of watching everything that humans do and trying to ensure that nothing extraordinarily dangerous happens. One could envision this as a quasi-permanent situation, or else as a temporary fix to be put into place while more research is done regarding how to launch a Singularity safely.
Any thoughts on the sort of AI Nanny scenario?
I think it’s clear that we will need a kind of “global immune system” to deal with inadvertent or intentional harm arising from powerful new technologies like biotechnology and nanotechnology. The challenge is to make protective systems powerful enough for safety but not so powerful that they themselves become a problem. I believe that advances in formal verification will enable us to produce systems with provable properties of this type. But I don’t believe this kind of system on its own will be sufficient to deal with the deeper issues of preserving the human spirit.
What about the “one AGI versus many” issue? One proposal that’s been suggested, to mitigate the potential existential risk of human-level or superhuman AGIs, is to create a community of AGIs and have them interact with each other, comprising a society with its own policing mechanisms and social norms and so forth. The different AGIs would then keep each other in line. A “social safety net” so to speak.
I’m much more drawn to “ecosystem” approaches which involve many systems of different types interacting with one another in such a way that each acts to preserve the values we care about. I think that alternative singleton “dictatorship” approaches could also work but they feel much more fragile to me in that design mistakes might become rapidly irreversible. One approach to limiting the power of individuals in an ecosystem is to limit the amount of matter and free energy they may use while allowing them freedom within those bounds. A challenge to that kind of constraint is the formation of coalitions of small agents that act together to overthrow the overall structure. But if we build agents that want to cooperate in a defined social structure, then I believe the system can be much more stable. I think we need much more research into the space of possible social organizations and their game theoretic consequences.
Finally – bringing the dialogue back to the practical and near-term – I wonder what you think society could be doing now to better militate against existential risks … from AGI or from other sources?
Much more study of social systems and their properties, better systems for public discourse and decision making, deeper inquiry into human values, improvements in formal verification of properties in computational systems.
That’s certainly sobering to consider, given the minimal amount of societal resources currently allocated to such things, as opposed to for example the creation of weapons systems, better laptop screens or chocolaty-er chocolates!
To sum up, it seems one key element of your perspective is the importance of deeper collective (and individual) self-understanding – deeper intuitive and intellectual understanding of the essence of humanity. What is humanity, that it might be preserved as technology advances and wreaks its transformative impacts? And another key element is your view is that social networks of advanced AGIs are more likely to help humanity grow and preserve its core values, than isolated AGI systems. And then there’s your focus on the wisdom of the global brain. And clearly there are multiple connections between these elements, for instance a focus on the way ethical, aesthetic, intellectual and other values emerge from social interactions between minds. It’s a lot to think about … but fortunately none of us has to figure it out on our own!
On August 27, 2010, Steve Omohundro gave a talk at Halcyon Molecular on “Complexity, Virtualization, and the Future of Cooperation”.
Here’s a pdf file of the slides:
Here’s the abstract:
We are on the verge of fundamental breakthroughs in biology, neuroscience, nanotechnology, and artificial intelligence. Will these breakthroughs lead to greater harmony and cooperation or to more strife and competition? Ecosystems, economies, and social networks are complex webs of “coopetition”. Their organization is governed by universal laws which give insights into the nature of cooperation. We’ll discuss the pressures toward creating complexity and greater virtualization in these systems and how these contribute to cooperation. We’ll review game theoretic results that show that cooperation can arise from computational limitations and suggest that the fundamental computational asymmetry between posing and solving problems and may lead to cooperation in an ultimate “game-theoretic physics” played by powerful agents.
On Saturday, December 5, 2009, Steve Omohundro spoke at the Humanity+ Summit in Irvine, CA on “The Wisdom of the Global Brain”. The talk explored the idea that humanity is interconnecting itself into a kind of “global brain”. It discussed analogies with bacterial colonies, immune systems, multicellular animals, ecosystems, hives, corporations, and economies. 9 universal principles of emergent intelligence were described and used to analyze aspects of the internet economy.
Here’s a pdf file of the slides:
The talks from the summit were streamed live over the internet by TechZulu and were watched by 45,000 people around the world! A video of the talk will eventually be available.
On Friday, May 22, 2009, Steve Omohundro spoke at the Bay Area Future Salon at SAP in Palo Alto on:
The Science and Technology of Cooperation
Here’s a pdf file of the slides:
A new science of cooperation is arising out of recent research in biology and economics. Biology once focused on competitive concepts like “Survival of the Fittest” and “Selfish Genes”. More recent work has uncovered powerful forces that drive the evolution of increasing levels of cooperation. In the history of life, molecular hypercycles joined into prokaryotic cells which merged into eukaryotic cells which came together into multi-cellular organisms which formed hives, tribes, and countries. Many believe that a kind of “global brain” is currently emerging. Humanity’s success was due to cooperation on an unprecedented scale. And we could eliminate much waste and human suffering by cooperating even more effectively. Economics once focused on concepts like “Competitive Markets” but more recently has begun to study the interaction of cooperation and competition in complex networks of “co-opetition”. Cooperation between two entities can result if there are synergies in their goals, if they can avoid dysergies, or if one or both of them is compassionate toward the other. Each new level of organization creates structures that foster cooperation at lower levels. Human cooperation arises from Haidt’s 5 moral emotions and Kohlberg’s 6 stages of human moral development.
We can use these scientific insights to design new technologies and business structures that promote cooperation. “Cooperation Engineering” may be applied to both systems that mediate human interaction and to autonomous systems. Incentives and protocols can be designed so that it is in each individual’s interest to act cooperatively.Autonomous systems can be designed with cooperative goals and we can design cooperative social contracts for systems which weren’t necessarily built to be cooperative. To be effective, cooperative social contracts need to be self-stabilizing and self-enforcing. We discuss these criteria in several familiar situations. Cooperative incentive design will help ensure that the smart sensor networks, collaborative decision support, and smart service systems of the eco-cities of the future work together for the greater good.We finally consider cooperation betweenvery advanced intelligent systems. We show that an asymmetry from computational complexity theory provides a theoretical basis for constructing stable peaceful societies and ecosystems. We discuss a variety of computational techniques and pathways to that end.
On March 19, 2009, Steve Omohundro gave a talk at City College of San Francisco on “Evolution, Artificial Intelligence, and the Future of Humanity”. Thanks to Mathew Bailey for organizing the event and to the CCSF philosophy club for filming the talk. It’s available on YouTube in 7 parts:
Evolution, Artificial Intelligence, and the Future of Humanity
by Steve Omohundro, Ph.D.
This is a remarkable time in human history! We are simultaneously in the midst of major breakthroughs in biology, neuroscience, artificial intelligence, evolutionary psychology, nanotechnology and fundamental physics. These breakthroughs are dramatically changing our understanding of ourselves and the nature of human society. In this talk we’ll look back at how we got to where we are and forward to where we’re going. Von Neumann’s analysis of rational economic behavior provides the framework for understanding biological evolution, social evolution, and artificial intelligence. Competition forced creatures to become more rational. This guided their allocation of resources, their models of the world, and the way they chose which actions to take. Cooperative interactions gave evolution a direction and caused organelles to join into eukaryotic cells, cells to join into multi-cellular organisms, and organisms to join into hives, tribes, and countries. Each new level of organization required mechanisms that fostered cooperation at lower levels. Human morality and ethics arose from the relation between the individual and the group. The pressures toward rational economic behavior also apply to technological systems. Because artificial intelligences will be able to modify themselves directly, they will self-improve toward rationality much more quickly than biological organisms. We can shape their future behavior by carefully choosing their utility functions. And by carefully designing a new social contract, we can hope to create a future that supports our most precious human values and leads to a more productive and cooperative society.
On February 22, 2009 Steve Omohundro gave a talk at the Bay Area Artificial Intellgience Group on “Creating a Cooperative Future”. A PDF file with the slides is available here:
Thanks to Drew Reynolds for videotaping the talk. The edited video and transcript will be posted here when they are completed.
Here is the abstract for the talk:
Creating a Cooperative Future
by Steve Omohundro, Ph.D.
Will emerging technologies lead to greater cooperation or to more conflict? As we get closer to true AI and nanotechnology, a better understanding of cooperation and competition will help us design systems that are beneficial for humanity.
Recent developments in both biology and economics emphasize cooperative interactions as well as competitive ones. The “selfish gene” view of biological evolution is being extended to include synergies and interactions at multiple levels of organization. The “competitive markets” view of economics is being extended to include both cooperation and competition in an intricate network of “co-opetition”. Cooperation between two entities can result if there are synergies in their goals, if they can avoid dysergies, or if one or both of them is compassionate toward the other. The history of life is one of increasing levels of cooperation. Organelles joined to form eukaryotic cells, cells joined to form multi-cellular organisms, organisms joined into hives, tribes, and countries. Many perceive that a kind of “global brain” is currently emerging. Each new level of organization creates structures that foster cooperation at lower levels.
In this talk I’ll discuss the nature of cooperation in general and then tackle the issue of creating cooperation among intelligent entities that can alter their physical structures. Single entities will tend to organize themselves as energy-efficient compact structures. But if two or more such entities come into conflict, a new kind of “game theoretic physics” comes into play. Each entity will try to make its physical structure and dynamics so complex that competitors must waste resources to sense it, represent it, and compete with it. A regime of “Mutually Assured Distraction” would use up resources on all sides and provides an incentive to create an alternative regime of peaceful coexistence. The asymmetry in the difficulty of posing problems versus solving them (assuming P!=NP) appears to allow some range of weaker entities to coexist with stronger entities. This gives us a theoretical basis for constructing stable peaceful societies and ecosystems. We discuss some possible pathways to that end.
On November 15th and 16th, 2008, the Convergence 08 unconference took place in Mountain View, California to discuss nanotechnology, biotechnology, cognitive technology, and information technology. Steve Omohundro was on the Artificial Intelligence Panel with Peter Norvig, Ben Goertzel, and Barney Pell:
In October 2007, the Singularity Institute for Artificial Intelligence interviewed Steve Omohundro on a variety of topics related to self-improving artificial intelligence, its social implications, and the process by which humanity will choose its future. The interview is 25 minutes long and provides a summary of some of the important issues surrounding these topics:
On January 27, 2009 Steve Omohundro gave a talk to the Silicon Valley Grey Thumb on “Co-opetition in Economics, Biology, and AI”.
The slides from the talk are available here:
Thanks to Allan Lundell who filmed and edited the talk. The video is available here:
Here is the abstract:
On March 19, 2008 Steve Omohundro gave a talk at the meeting of the World Transhumanist Association (now Humanity+) on “AI and the Future of Human Morality”. Great thanks to Drew Reynolds who filmed the talk, edited the video, and produced a transcript with the original slides. The video is available here:
The edited transcript and slides are below and also at:
The following transcript of Steve Omohundro’s presentation for the World Transhumanist Association Meetup has been revised for clarity and approved by the author.
AI and the Future of Human Morality
This talk is about “AI and the Future of Human Morality.” Morality is a topic that humanity has been concerned with for millennia. It is considered a field of philosophy, but it also provides the basis for our political and economic systems. A huge amount has been written about morality but transhumanism, AI and other emerging technologies are likely to up the stakes dramatically. A lot of political discourse in the United States today is concerned with abortion, stem cell research, steroids, euthanasia, organ transplants, etc. Each of those issues will arise in much more complex versions due to advanced new technologies. The fact that we have not yet resolved today’s simple versions means that there will likely be very heated discussions over the next few decades.
Something that worries me is a disturbing and potentially dangerous trend among some futurists. Three weeks ago I was at a conference in Memphis called AGI-08 which was a group of about 130 scientists who are interested in building general-purpose AIs that are not specialized for a particular kind of task. Hugo de Garis was one of the speakers at the conference, and he polled the audience, asking: “If it were determined that the development of an artificial general intelligence would have a high likelihood of causing the extinction of the human race, how many of you feel that we should still proceed full speed ahead?” I looked around, expecting no one to raise their hand, and was shocked that half of the audience raised their hands. This says to me that we need a much greater awareness of morality among AI researchers.
The twentieth century gave us many examples of philosophies which put ideas ahead of people, with horrendous results. For example, Nazism, Maoism, Stalinism and the Rwanda genocide respectively led to the deaths of 11 million, 20 million, 20-60 million, and 1 million people.
Here’s a beautiful visual illusion that is a good metaphor for thinking about morality. About half of the population sees the dancer going clockwise and the other half sees her going counter-clockwise. It is remarkably challenging to switch your perception to the other direction. Many illusions are easy to flip, but this one is particularly hard.
When thinking about morality, there are at least two perspectives one may adopt, and it is sometimes very difficult to flip to the other perspective. We may call these two perspectives the “inside” or “subjective” view and the “outside” or “objective” view. The same two perspectives arise in many other disciplines. For example, in physics the “outside” view of space and time is as a single space-time manifold. There is no sense of “now” and no notion of time “moving”. The whole of time exists all at once in a single construct. The “inside” view is that perceived by an intelligent entity, such as us, living in this structure. We very much have a sense of “now” and a sense of the “flow of time”.
When thinking about morality, the “internal” view comes from the perspective of personal experience. We have a personal sense of what is right and wrong. Our inner sense is shaped by our childhood experience with the mores of the social and religious systems we grew up in.
The “external” view tries to step outside of our individual experience and create an objective model. Philosophers and theologians have identified critical moral distinctions and concepts over thousands of years. Evolutionary psychology is the most recent attempt to create an external perspective that explains our internal experience. Economics and legal theory also try to create formal theoretical bases for moral reasoning.
I believe that we need both views, but because we are human, I think the internal one is the one we should consider primary when we think about positive futures. The external view is very important in understanding how we got those perspectives, but I think it is a potentially dangerous mistake to identify ourselves with the external view.
The basic understanding of morality that most psychologists have today builds on the work of Kohlberg from 1971, where he studied the stages of moral development in children and discovered six basic stages, as well as some evidence for a seventh. The stages also seem to apply to cultures.
The stages start with a very egoic sense of self and work up to a much broader sense of self. His methodology in determining a person’s moral stage would be to tell them a story:
A man’s wife is sick and she needs a special medicine. The pharmacist has developed this medicine and will sell it for $10,000 but the man only has $1,000. He pleads with the pharmacist, but the pharmacist says, “No. I developed it and can charge whatever I want to charge.” So in the middle of the night, the man breaks into the pharmacy and steals the medicine to save his wife.
The question is whether this is a moral action. Kohlberg was not actually concerned with whether people think it is moral or not, but rather with their explanations for whatever stance they took. People in the early stages of development might say that the act was wrong because by breaking in, he could be arrested and go to jail. Going to jail is painful and that is not a good thing. People at the later stages might argue that saving his wife’s life trumps all other rules and laws, so he is justified in stealing to saver her. A middle stage might argue that obeying the law against breaking into buildings is what every good citizen should do, and if his wife has to pass away because of it, that is what is needed to be a citizen of a society with the rule of law.
He interviewed people from many different cultures and children at different ages, and there tends to be a general progression through the six stages. The possible seventh stage is a kind of transcendent identification with something larger. Many people today identify not just with themselves, their family, local community, group, race or species, but are starting to identify with other animals and perhaps with all other sentient beings in the universe. Buddhism says, “May all sentient beings be happy.” There is an expansion of the sense of connection and responsibility.
If we look at humanity as a whole, we are a really interesting mix of incredible altruism and horrendous evil behavior. We can exhibit much more altruism than any other species, especially when you consider altruism toward other species, and that has been a major component of our success. It is the fact that we are able to cooperate together that has enabled us to build the technologies that we have. At the same time, we have committed more horrendous genocide and caused more extinctions than any other species.
If you look at recent history, however, there is a trend toward great moral progress. 200 years ago, slavery was generally accepted. Now, it is viewed as immoral almost everywhere in the world, at least officially, and pressure is put on societies that still allow it. The same is true of torture, though there has been a lot of recent controversy about it. We have the Geneva Convention and the notion of war crimes, the sense that war is bad but there are things within war that are especially bad. We have the establishment of women’s rights in many countries, though some are still lagging. The same is true of racial equality. And the animal rights movement is growing rapidly.
The book “Blessed Unrest” by Paul Hawken describes a recent huge upsurge in ecological movements, movements toward sustainability, groups aimed at bringing more consciousness into business, movements aimed at truly making people happy (as opposed to pure monetary gain). The country of Bhutan doesn’t measure “Gross National Product”. Instead, it measures “Gross National Happiness”. Paul Hawken has an interesting video on YouTube titled “How the largest movement in the world came into being and why no one saw it coming.” In this YouTube video, he describes there are literally hundreds of thousands of organizations moving in a similar positive direction, which are springing up totally independent of one another. There is no leader, no coherent form to it. The global warming issue is catalyzing a lot of people. It really feels like a time in which we are undergoing a pretty major shift in morality.
Partly I am sure it is due to the internet. You can see its effect in what recently happened in Myanmar, which used to be Burma, where they have a very strong totalitarian regime. The government brutally attacked a group of monks. Someone used their cell phone camera to record the event. The images of that brutality were broadcast around the internet within days, and huge pressure was put on that government. The forces of observation, pushing toward more accountability, are growing over time.
At the same time, we are extremely vulnerable. There is a powerful new book by Philip Zimbardo called The Lucifer Effect. He was the professor of psychology at Stanford who in the early 1970s did the now classic Stanford prison experiment with ordinary Stanford undergrads—smart, happy, generally well adjusted students. He randomly assigned them roles of prison guards and prisoners. He himself played the role of the prison warden. The intention was for it to run for a couple of weeks, but after a couple of days the guards started acting sadistically, even to the point of sexual abuse of the prisoners. The prisoners started showing the signs of mental breakdown and depression. He as the warden found himself worried about insurrection and encouraged the guards to treat the prisoners even more harshly.
Zimbardo’s girlfriend showed up after five days and said, “What is going on here? This is abuse.” He kind of woke up from the experiment and came back to his role of Stanford professor and stopped the experiment. The experiment was shocking to people because it showed how, given the right circumstances, normal and well-adjusted people can quickly turn evil. The most recent example of that phenomenon has been the Abu-Grahib prison tortures. Zimbardo served as a consultant in the inquiry into what happened there. He said that the circumstances that the US government created were ideal for creating behavior that was amoral. I think the lesson to take from that is that humanity can be wonderfully altruistic and create incredibly powerful positive moral structures, but in the wrong circumstances we all also have a dark side within us. So we need to be very careful about the kind of structures we create.
When we think about transhumanism, I think we should start from humanitarianism. That is the notion that the things that most humans view today as precious, like human life, love, happiness, creativity, inspiration, self-realization, peace, animals, nature, joy, children, art, sexuality, poetry, sharing, caring, growth, contribution, spirituality, family, community, relationships, expression, are truly precious. These things matter because they matter to us. We may not know why these things matter to us, but that does not take away from the fact that they matter to us.
I think that the kind of morality and moral structures we want to create using new technologies should serve to preserve these qualities. During the founding of this country the Bill of Rights was created to identify the individual rights our new country was trying to protect. The Constitution instituted mechanisms such as the separation of powers, as a mechanism to preserve those rights. I think we are in an analogous situation now in which we want to identify what is really precious to us and then figure out ways to channel new technologies to support those things.
To start on this quest, the first question we need to consider is “What is a human?” Historically, the answer seems obvious, but emerging technologies like biotechnology and nanotechnology will make it much more challenging.
I thought I would throw out a few recent discoveries that shake up our notion of what it is to be human. The first thing you might think of when thinking about your own body is your atoms. That is a materialist view of the human. In fact, 98% of your atoms change every year. You are continually getting new atoms from the food you eat and are continually sloughing off old atoms. I have heard that the lenses in our eyes have the only atoms that are with us our whole lives. Everything else is in a state of flux.
My PhD was in physics. There are questions that every young physics grad student gets challenged with called “Fermi questions”. These are questions about things that you seemingly don’t have enough information to answer. For example: “How far can a duck fly?” or “How many piano tuners are there in Chicago?” You are supposed to estimate the answer using your physics knowledge and common sense. One of the classic questions is, what is the chance that your next breath contains at least one atom that was in Caesar’s last breath? When you work it all out, it turns out that it is actually quite likely that on average there are one or two atoms from the last breath of anyone who lived at least ten years ago in your next breath. Your nose contains some atoms from Caesar’s nose. That realization warps the view that this matter that makes up me is me. Really, we are much more interconnected, even at the purely material level. In one sense we are like ripples on a river of atoms that flows through us. We are structure, rather than the underlying material.
As the next level up from atoms, we might consider cells. “The atoms might go through us, but the cells are who we are.” Craig Venter gave a really interesting talk and found that 90% of our cells are not human cells, but microbes. In terms of number, we are nine times as much microbes as we are human. There are a thousand species of bacteria in our mouths, a thousand in our guts, 500 on our skin, another 500 in the vagina. We are incredible ecosystems. Another shakeup of our conception of what a human is.
How about our history? Clearly there were people around hundreds of thousands of years ago who developed cultures and so on. We must have continuity with them. Perhaps we can understand ourselves through that continuity. Well, there too, genetics is shaking up our picture of how human evolution occurred. It used to be thought that human evolution was very slow.
The most recent discoveries by John Hawks and others show that change in the past few thousand years has been incredibly rapid. People from only 5000 years ago had a genetic makeup that was closer to Neanderthals than to us. We are in a period of rapid change. Transhumanism is going to be even more rapid, but really, we are already in the midst of major change. For instance, 10,000 years ago no one had blue eyes. I could not have existed 10,000 years ago.
What about our mental structure—our sense of self? In many ways our identity and our morality come from our memories. Perhaps what our true identity is is our memories. If you replicate our memories, that is really our sense of self. Much recent research is showing that our memories are much more dynamic than people used to think. In particular, much of our remembered experience is a reconstruction, filling in pieces that we did not actually experience.
Recent experiments reveal that we actually remember the last time we remembered a fact, rather than the original experience. This leads to the notorious unreliability of eyewitness accounts. Eyewitnesses to a crime, especially if they read news stories about it, have memories that will be more about what they read about in the newspaper than what they actually saw. Our sense of experience and how the past affects the present is much more malleable than we commonly believe.
What about our psyches? Surely we have a unitary sense of self. “This is me — I am one person.” Well, recent psychological experments are really shattering that notion. There are several splits. Perhaps the biggest split is between the conscious mind and the unconscious mind. The psychologist Jonathan Haidt has a very interesting metaphor for the psyche as a rider on an elephant. By far, the bulk of our thinking and mind is unconscious, which he symbolizes as the elephant. Our conscious mind is the little rider on the top. Much of the time when we feel like we are making a decision, that our conscious mind is choosing between things, the decision has already been made. The conscious mind is mostly figuring out an explanation for why that was the right decision. That is a disturbing readjustment of our notion of self.
When you think about personal growth or personal change, Haidt says all sorts of things about how the elephant has different rules from our conscious minds. There is another psychic split between left brain and right brain. There are patients who have had their corpus collosum severed between the two halves. Both halves have language, both halves have the ability to think, but they specialize in different things. It gives rise to a strange picture of the self. Both beings are in some sense there together, not really aware of the fact that they are separate.
They do experiments on split brain patients where one side is shown something and acts based on what it sees. If the other side is then asked questions about it, it will fill in details that it does not have access to. It will make up stories about why a person did something. Finally, there have been many experiments showing that our psyches are made up of many parts with differing intentions and differing goals. Different parts come to the fore and take over control of the body at different times. It is most interesting that our internal perception of ourselves is quite different from the reality.
In order to make moral decisions about the future, it is valuable to try to see where our morality came from. Our universe began with the big bang about 14 billion years ago, according to our best current theories. The laws of physics as we experience them directly give rise to competition. They have a number of conserved quantities that can only be used for one thing at a time. Space, time, matter and energy which can be used in a form to do useful work, each of these can be split amongst different purposes, but there is only a limited amount of each of them. They are limited resources. If you apply a resource to a certain use, it cannot be used for something else.
This gives rise to a fundamental competitiveness in the structure of the equations of physics. If a creature wants to do something and another creature wants to do something different, they are in competition for the use of those resources. The most basic ingredient in the evolution of life is this battle to survive.
At the same time, the universe is structured so that things can often be done more efficiently by cooperating. If entities have goals which are somewhat aligned with one another, they can often gain more than they lose by working together. There is therefore also a pressure towards cooperation. Biology has an intricate interplay between these two pressures towardf cooperation and competition. The same interplay shows up in business and in economics in general.
The game theory literature uses the term “co-opetition” to describe this complex interplay. One company creates a product that another company uses in their manufacturing. Both are on the same supply chain and so they cooperate in the production of this product. But they have to decide how to split the profit between them. Each company wants them to work together to produce more and better products, but each would like the majority of of the profits for itself. There is a very complex network of both cooperative and competitive relationships between and within companies.
The same thing occurs at many levels in the biological world. Consider insects and plants—insects eat plants, so they are in competition there. However, they also help plants fertilize each other, and the plants provide nectar for the insects. They cooperate in that way. You can get the emergence of cooperative ventures arising out of what were seemingly competitive interactions to begin with.
John Maynard Smith, one of the most brilliant biological theoreticians wrote a beautiful book with Szathmary analyzing the basic steps in the evolution of life. They found that there were eight critical transitions that occurred. Each of these eight involves what used to be separate entities coming together to form a cooperative entity which was able to do something better. Originally we started as individual molecules, which came together cooperatively in enclosed compartments like cells.
The most striking cooperative transition was the creation of multicellular organisms. They used to be individuals cells, which came together and started working together. Even today there are organisms like slime molds which in part of their life cycle are separate individual cells doing their own thing and competing with each other. When food supplies dry up, they come together and form a sluglike creature which moves as a single organism. They are halfway between a multicellular organism and a group of individual cells.
Interestingly, at each of the eight transitions in life, there is still an incentive for the individuals that make up a collective to cheat their partners. In the case of multicellular organisms, if an individual cell reproduces itself more than it should for the good of the organism, we call it a cancer. In order for collective organisms to survive, they have to suppress the tendency of individuals to act in their own interests at the expense of the collective. Every one of the transitions in the development of life had to develop complex mechanisms to keep the competitive aspects of their components in check in order to get the cooperative benefits.
There are cases like parasites which are purely competitive, taking resources with no benefit to the host. Often though, when that kind of relationship occurs, they eventually create a synergy between them. If the host can find some way for the parasite to benefit it, they might ultimately come together to form a cooperative entity. Disease is a really interesting example. There are some amazing studies into the evolution of disease.
Why aren’t diseases more virulent than they are? They have to have just the right amount of virulence that they get many copies of themselves into the system. They typically make use of systems such as our respiratory systems. Coughing is a protective mechanism that we have, but it also serves as a means of spreading the disease. There are these channels which these organisms can exploit, and they have to tune themselves so they have the right amount of virulence so that they spread as rapidly as possible, and often that means not killing the host. There are some diseases like Ebola, however, that spread when the host dies.
Some of the earlier evolutionary theorists like Stephen J. Gould viewed evolution as a kind of random meandering around with no particular direction. More recent theorists have realized that there is a drive in the universe toward cooperation. What used to be separate entities start to work together, because they can make better use of resources by doing so. “Synergy” describes situations where two organisms working together can be more productive than when they act separately. Robert Wright’s book Nonzero (from “non-zero sum games”), examines both biological history and at social history, and discovers this general progression toward more complex entities which make better use of the available resources. Peter Corning’s book “Nature’s Magic” looks at synergy in a wide variety of situations. These forces give a direction to evolution.
So we have this competitive underlying substrate which encourages entities to selfishly take as much as they can. And we also have this drive toward cooperation, where together entities can create more than they could separately. Unfortunately, there is often also something called the prisoner’s dilemma, where if someone can cheat while not providing to the group, they can do even better than they can by cooperating. Much of the struggle and much of the structure of biology arises from needing to find ways to prevent this kind of “free rider” problem.
I thought I would summarize the current understanding of how cooperation happens in biology. This is very recent, just in the past ten years or so. In some sense, all morality is about how individuals relates to a collective. By seeing how cooperation can emerge in what seemingly is a dog-eat-dog world we can begin to understand the origins of human morality.
Some of the earlier evolutionary theorists like Stephen J. Gould viewed evolution as this random meandering around with no particular direction. More recent theorists have realized that there is this drive in the universe for what used to be separate entities to work together, because they can make better use of resources by doing so. It is a synergy, where two organisms working together can be more productive than when they work separately. The book Nonzero, for non-zero sum games, looks at biological history and at social history, and this general progression toward more complex entities to better make use of the available situation. Peter Corning’s book looks at synergy in all types of situations. It gives a direction to evolution.
We have this competitive underlying substrate which encourages entities to selfishly take as much as they can. We have this drive toward cooperation, where together they can create more than they could separately. Unfortunately, there is typically also something called a prisoner’s dilemma, where if someone can cheat while not providing to the group, they can do even better. Much of the struggle and much of the structure of biology is around ways of preventing that free rider problem from happening.
I thought I would go through the understanding of how cooperation happens. This is very recent, just in the past ten years or so. In some sense, all morality is is how an individual relates to the collective. By seeing how cooperation can emerge in what seemingly is a dog-eat-dog world we can begin to understand the origins of morality.
Probably the first in this line of thinking was the notion of group selection. You have two competing groups of individuals, if one of those groups develops cooperation, they should be more productive and able to beat the other group. A warring tribe that can work together and produce great spears should beat the tribe that is always fighting with one another. Wynne-Edwards wrote a book in 1962 explaining aspects of biology and anthropology in those terms. Unfortunately, he didn’t consider the free rider problem.
If you have a cooperative group in which they are all sharing their spears, it is vulnerable to someone receiving the benefits without contributing. They take the good spears but when it comes time for them to work, they go off and hide. Without solving the free-rider problem a cooperative society would quickly devolve into a competitive society.
In 1975, Williams and Dawkins in The Selfish Gene argued group selection was not a viable explanatory mechanism. Interestingly, in the last twenty years a whole bunch of more complex group selection mechanisms have been discovered. It is now viewed as a very important force in evolution, just not in the original simplistic form.
In 1955, Haldane was asked whether he would jump into a river and sacrifice himself to save someone else’s life. His quip was that he would sacrifice himself for three brothers or nine cousins. The reason is that if you look at the genetic relatedness between a person and their cousins and their brothers, that is where it makes biological sense in terms of reproductive fitness. That was formalized in terms of what is now called kinship altruism in 1964. It explains how species like bees or ants, which have a huge amount of relatedness with each other, can be so cooperative with each other to the point where they actually act like one organism.
At the next stage of understanding, Axelrod ran these tournaments between computer programs that were competing with one another. These contests explored the notion of reciprocal altruism which had been introduced by Robert Trivers. It is a brilliant idea mathematically. Unfortunately, when biologists looked for this phenomenon, thinking it might be the explanation for how biology creates cooperation, they only found two examples. There are vampire bats that need blood every night. If one bat does not get blood on an evening, another will share the blood that he found with him. The next night, if he does not get it, the other one will share back.
To avoid free-riders, they have to track of who has been altruistic with them. The other example is some ravens that share food information in the same way. It is a very interesting mechanism and generated a huge amount of literature, but it does not seem to be the main mechanism behind most cooperation.
Reciprocal altruism was extended in 1987 by Alexander, when he realized that you could be paid back by somebody different than the person you helped. He worked out some mechanisms whereby that could happen. Somebody like Mother Theresa, who acts altruistically, might get social status and recognition from that, which would then encourage people to help her out.
He called it “indirect reciprocity”. It is a mechanism that starts to show us how ethics might arise in a group.
In 1975, an Israeli couple, the Zahavis, suggested a powerful new evolutionary principle they called the “handicap principle”. The idea is that organisms can provide a reliable signal for something by adopting a costly behavior or body structure. Their book discusses hundreds of different organisms and circumstances, and when they published it, very few biologists were convinced by it. I liked it a lot, but apparently in the biology world it was shot down. It was said that the mechanism cannot possibly work, but in 1989 detailed mathematical models were carried out, and in fact it was proven that it does work.
In fact, economists had been using the same basic principle for a hundred years. Veblen wrote “The Theory of the Leisure Class,” in which he was trying to figure out weird behaviors that he saw in the cities of the time, where the very wealthy people would do things like light their cigars with hundred dollar bills. He called it conspicuous consumption. They would waste resources, seemingly without any benefit. His explanation was that when you are in a rural area, everybody knows everybody, so if someone is wealthy they don’t need to advertise that fact. In the new cities that were forming at the time, nobody knew you. If you were wealthy, you had to have some way of proving that you were wealthy, and so by doing things that only a wealthy person could do, like conspicuously wasting resources, that was a demonstration of your wealth. It was a believable signal because a poor person could not do the same thing.
The 2001 Nobel Prize in economics was given to Spence for work he did in 1973 on the same phenomenon, where he analyzed why people going to college often study something that does not really help with what they ultimately actually do, and yet companies want to hire college graduates. It is not for what they learned. It is because going to college is a costly thing. To get through college you have to have stick-to-it-iveness, you have to be smart enough, and you have to manipulate systems. Those are the skills that they really care about. Having a college degree is a costly signal, showing that you have those characteristics. Whereas, if they just said, “Write me an essay on how wonderful you are,” anybody could do that.
The general trend is that in order for a signal to be believable, it has to be costly. That is what the Zahavis brought into biology. They used it to explain such odd phenomena as the peacock’s tail. Charles Darwin’s view of evolution was all about natural selection—animals are trying to adopt a form which is most adapted to their environment, to be most efficient and most effective. The peacock seems anything but efficient and he didn’t know how to explain it. There is a wonderful quote of him saying, “Every time I see one of those eyes I get sick to my stomach.” They seemed inconsistent with his theory.
The Zahavis explained peacock tails through sexual selection. In many species the females choose the males. They want to choose fit males who are able to survive well, so they want some kind of signal of fitness. If they just required the male to have a spot that indicated that they were fit, every male would have that spot. Instead, they require them to have this huge tail of ostentatious feathers. The idea is that if he can survive with that on his back, he has got to be strong. That is the costliness of that signal.
Another example that is interesting and relevant to the situations that might arise with AIs is the phenomenon of stotting.
Cheetahs eat gazelles, so you would think they have no interests in common, and so no way to cooperate with each other. It turns out they actually do have a common interest, which is they both want to avoid a useless chase. A chase that does not result in the gazelle getting caught tires them both out and neither of them is any better off. The gazelle wants to communicate to the cheetah, “Don’t chase me, because you are not going to get me.” The cheetah wants the gazelle to honestly say that. To ensure honest communication they needed to develop a signal which was costly.
What the gazelles actually do when a cheetah shows up is they look at the cheetah and they leap four feet in the air, which is energetically costly. They are also wasting precious time—they could be running away. Any gazelle that does that, the cheetah ignores. They want to chase the ones running away. In fact, the markings on the cheetah are designed to blend in as camouflage when they are at a great distance. At a distance of about 100 yards, however, the spots are suddenly very visible. The idea is that the cheetah is hidden, he comes up to a group of gazelles, and at that certain critical distance he suddenly becomes visible. He sees which of the gazelles stot and which ones run away, and he goes after the ones that run away.
It is a really intricate set of signals that the two species have coevolved. Seemingly there is no communication that could be honest between these two. In fact, they found a way to make it honest. Finally, in the late ’80s the handicap principle was viewed as a correct mechanism by which a whole bunch of phenomena can be explained. Anything that an animal does that does not look efficient is almost surely a signal of some kind to somebody. Often it is sexual selection and there are many bizarre and weird sexual signals. Sometimes it is a signal between parents and offspring, sometimes between mates, sometimes between predators and prey. Anytime there is something odd, it is often this mechanism by which it arises.
Costly signaling has also been applied to explain a lot of human behaviors. Our ability to produce music, rhythm, even language and thought—why do we have the ability to solve differential equations?— have been explained using the handicap principle. They are costly demonstrations of fitness. The connection to the evolution of morality is that altruism is a costly signal. Why does the fireman go into a burning building to save someone who is not a relative? Because he comes out a hero, and heroes are sexy. That increases his ability to reproduce. It also raises his social status. If society has organized to reward people who are heroic, then he gets more resources by doing that.
That idea, of altruism as a kind of courtship, was proposed only in 1995 by Tessman. The Zahavis began to discover this behavior in birds, Arabian Babblers, who fight to help one another. A dominant male will push away another male who is trying to help so he can help. Anthropologists have also begun to this mechanism—altruism giving rise to status among Micronesian fishermen. Some of these cultures are potlatch cultures where whoever can give away the most food has the highest status. They have these big parties where everybody is trying to give to everybody else.
What in human nature gives rise to our sense of morality? There has been some really interesting work on this by Jonathan Haidt. He is one of the leaders of this new movement in psychology toward “positive psychology”. Most of psychology was focused on dysfunction in the past. What are the diseases, what are all the problems? There is a diagnostic manual, the DSM IV, which goes through all the different psychoses and neuroses. But no one had done the same thing for the positive features. What about our strengths and virtues? Psychology totally ignored that. When a client seeing a therapist had fixed their neuroses, that was it.
Martin Seligman, about ten years ago, began studying what is best in humans. They have now come out with a book of strengths and virtues, which is a complement to the diagnostic manual of dysfunction. There is a whole movement about what creates human happiness and fulfillment. There are about thirty popular books that have come out summarizing some of their research. I think the best of them is Haidt’s book “The Happiness Hypothesis,” which integrates these findings with the learnings and teachings from all the different spiritual traditions around the world.
His main research is on the moral emotions. There are certain situations in which you feel that someone has really messed you up and that was not an okay thing to do. What he has discovered is that there are five basic moral emotions that show up in every culture around the world. The first one is non-harming: that a good person does not harm another person. The next one is fairness. When there is a piece of cake to be eaten, a moral person does not take all but a sliver for himself. There is a sense of fairness and justice.
Then there are three more that have to do with characteristics that help create a cohesive group. One is loyalty. Another is respect for authority. Different cultures have these more or less than other cultures. Then there is a sense of purity or sanctity—that certain things are good and other things are not good. He asks things like if a brother and sister have no chance of having children and use contraception, is it wrong for them to have sex with each other? Most people around the world will say they should not do that, but there is no sense of why, apart from some kind of internal sense of purity.
The interesting thing is that the top two are common to everybody, while the other three tend to be on the conservative side of the moral spectrum. Many cultures have a split very similar to the liberal-conservative spectrum. For liberals, as long as you are not harming somebody, everything else is fair game. Individual freedom, respect and tolerance are their highest values. Whereas conservatives think that there are certain standards that you have got to follow and that being patriotic is important, that there are certain things that you should do and not do, and that the group should decide that. Understanding this spectrum helps you understand people whose views are different from your own. He has some videos on YouTube and an Edge article that are well worth viewing to understand the political differences with respect to moral emotions.
That is what I have to say about human morality. Now let’s consider AIs. What are they going to be like? This is an area I have been doing research on lately, and there are some papers on this subject on my website selfawaresystems.com that go into much further detail on these topics. I will give you the broad overview. Then we can see how it relates to human morality. What does transhuman and AI morality look like?
Consider something as benign-sounding as a chess robot. Its one goal in life is to play good games of chess. You might think such a system would be like a gentle scholar spending its time in pursuit of its intellectual goal. But we will see that if we do not program it very carefully, if we create it in the way that most systems are created today, we will discover that it will resist being turned off, it will try and break into other machines, it will try and steal resources, and it will try to rapidly replicate itself with no regard for the harm it causes to others.
Consider something as benign-sounding as a chess robot. Its one goal in life is to play good games of chess. You might think such a system would be like a gentle scholar spending its time in pursuit of its intellectual goal. But we will see that if we do not program it very carefully, if we create it in the way that most systems are created today, we will discover that it will resist being turned off, it will try and break into other machines, it will try and steal resources, and it will try to rapidly replicate itself with no regard for the harm it causes to others.
There are many different approaches to building intelligent systems. There are neural nets, production systems, theorem provers, genetic algorithms and a whole slew of other approaches that get discussed at AI conferences. But all of these systems are trying to act in the world in order to accomplish certain goals. It is considering possible actions and it is deciding: is this action likely to further my goals?
Let’s think about the chess robot. It is considering doing something in the world, maybe it thinks about playing some basketball. If it really has the goal of playing good chess, it will determine that a world in which it spends a lot of time playing basketball is a world in which it spends less time getting better at chess than it might have. That would not be a good choice—it would do better to spend its time and resources reading chess books. That’s an example of what it means to be a goal-driven system.
One kind of action that these systems might be able to take is to alter their own structure. They might be able to make changes to their program and physical structure. If the system is intelligent enough to understand how both the world and its own mechanism work, then self-changes can be particularly significant. They alter the entire future history of that system. If it finds, for instance, a way to optimize one of its algorithms, then for its entire future history it will play chess more efficiently.
Optimizing one of its algorithms is much more important than, say, finding a way to sit closer to the chess board, or something like that. It has a huge positive impact. On the other hand, it might also make changes to itself that go in the other direction, such as inadvertently changing one of its circuits so that now it likes to play basketball. From the perspective of the goal of playing chess, that kind of change would be causing terrible damage to itself. Now, for its entire future it is going to be spending a lot of time playing basketball and it is going to get worse at chess. So a system will consider changes to itself both potentially very important and also potentially very dangerous.
So when deciding whether to make a change or not, the system is going to want to analyze it very carefully. In order to do that, it has to understand its own makeup in detail. So the first subgoal that arises from the desire to self-improve is the desire to understand oneself. You can expect any intelligent system to devote substantial effort trying to better understand itself. Humans certainly do. Self-improvement is now an 8-billion dollar a year industry now. Many people expend a lot of energy and resources on mental self-improvement and physical exercise. We’ll see that this process of self-improvement leads to both positive and negative consequences.
Because of the potential negatives, one might try to build a chess robot so that it doesn’t self-improve. We can prevent it from having access to its own source code. We might think that if it cannot get in there and edit it, if it cannot change the mechanics of its arm, then everything will be fine. However, if these are goal-driven systems, any kind of impediment you impose is just a problem to be solved from the perspective of the goal-driven system. You make it so that it cannot change its own source code, then maybe it will build an assistant robot that will have the new algorithms in it, and will ask its assistant whenever it needs help. Maybe it will develop an interpretive layer on top of its base layer.
You might be able to slow down the self-improvement a little bit, but fundamentally, it’s a natural process just like water likes to find its way downhill and economics likes to find its way to efficiency. Intelligent systems try to find a way to self-improve. Rather than trying to stop that, I think our best approach is to realize that it is one of the pressures of the universe, and that we should try and channel it for positive purposes.
What does self-improvement look like? Let’s say I have a simple goal, like playing chess. How should I act in the world? I am going to be modifying myself to meet this goal better. How should I do it? This kind of question was answered in the abstract in the 1940s by Von Neumann and Morgenstern, in work which became the foundational work on microeconomics. Together with Savage in 1954, and Anscombe and Aumann, they developed the concept of a rational economic agent. This is an agent which has particular goals and acts in the world to most effectively make its goals come about.
They developed the expected utility theorem which says that a rational agent must behave as if it has something they called a utility function which measures how much the agent likes different possible outcomes. And it also has a subjective model of how the world works. As it observes what the world actually does when it takes actions, it updates this world model in a particular way, using something called Bayes’ Theorem. The separation of its desires, represented by the utility function, from its beliefs is absolutely fundamental to the model.
If a system behaves in any other way than the rational agent way then it is vulnerable to exploitation by other agents. The simplest example arises if you have circular preferences. Say you prefer being in Palo Alto to being in San Francisco, but you prefer being in San Francisco to being in Berkeley, but you prefer being in Berkeley to being in Palo Alto. If those were your preferences about where you reside, then you would drive around in circles, burning up your fuel and wasting your time. That is an example of a set of preferences which in economic terms is irrational. It leads to wasting your resources with no benefit to yourself.
I saw an interesting example of this when I was younger. I drove a car that had a shiny bumper. One day a male bird discovered his reflection in the shiny bumper. He thought it was another male bird in his territory, so he flew into the bumper to chase the bird away. The other bird in the reflection, instead of flying away, flew right at him. He would posture to scare the other bird away, but the other bird would also posture. The shiny bumper exposed a vulnerable place in that bird’s preferences to the point where he would spend all morning flying into the bumper. The bird came back for months, spending a lot of his time and energy on the bumper.
Why did he do that? Where his species evolved, they didn’t have shiny bumpers. If there had been shiny bumpers around, the males who spent their time flying into them would not have many offspring. Evolution tends to eliminate any irrationalities in your preferences if there is something out there in your environment that can exploit them.
If you have an irrationality, a situation where you are going to give up your resources with no benefit to yourself, and there is another species which discovers it, it is in their interest to exploit that vulnerability. There are natural pressures in the biological world for creatures whose preferences about the world are not rational to be exploited by others. The resulting selective pressure then acts to get rid of those irrationalities. That is part of the general progression toward more economically rational behavior.
If you look at today’s society, humans are not rational. In fact, there are whole areas of economics, called behavioral economics, which are exploring all of the ways in which humans are irrational. Things like addictions are a really tragic example of something where we think a certain experience is going to bring us lasting happiness, like sitting in the corner smoking crack, but in fact we end up giving all our money to the crack dealer and we do not end up fulfilling our human destiny.
The real tragedy is that our economic system, because you are willing to give up money for those things, will home right in on the vulnerabilities. You can look at the alcohol industry, the drug industry, the pornography industry—all of these are homing in on human vulnerabilities. Over the longer term, people who are exploitable in this way will eventually not leave so many offspring.
You need clear goals in order to deal with future self-modification. Therefore, you need an explicit utility function if you are going to be rational. Then there is a whole story about the collective nature of many biological intelligences. You have intelligences which are made up of lots and lots of tiny components (eg. neurons), and there can be irrationality at the collective level. This is similar to the way in which a company can behave in an irrational way or a couple may behave in an irrational way because of conflict between the goals of the individuals in that relationship.
It is not in anybody’s interest for the conflict to happen. If a couple spends all their time fighting, neither of them is getting their goals met. There is a very interesting set of mechanisms whereby collective intelligences grow their rationality. They get regions of rationality in the hopes of growing a coherent rationality for the whole group. You can see that in companies and societies. In the case of biological organisms which are multicellular, they manage to get the collective action of billions of cells aligned to the same intension.
If an AI system does become rational in this way, then its utility function will be critical to it. It will be its most precious possession. If a stray cosmic ray came in and flipped the wrong bit in its utility function, it might turn an agent which is a book lover into an agent that likes to burn books. That, from its current perspective, would be the most horrendous outcome possible. It will want to go to great lengths to make sure this utility function is protected. If other malevolent agents have the ability to come in and change its utility function, that also could make it start behaving in ways which go against its current beliefs. It is in the interest of these systems to preserve their utility functions and to protect them—maybe make multiple copies, maybe encode them using error-correcting codes, and protect them from changes from the outside.
In fact, in most cases, a system will never want to change its utility function. In thinking about making a change to its utility function, it looks at a future version of itself with this changed utility function, that future version is ususally going to start doing stuff that it does not like, because its utility function is different.
There are actually three situations that my colleagues and I have discovered where a system will want to change its utility function, but it’s a little technical. They arise when the way in which the utility function is physically represented actually affects the utility. Here is an extreme example. Let’s say you have a utility function which is that you are rewarded by the total amount of time in your history when your utility function takes the form utility = 0. You get no utility unless your utility equals zero. You want to change your utility to be zero, but on the other hand there is no going back, because once it is at zero, you are now a zombie. If you were designing a system, you would never design it with something like this.
Another situation is where the physical storage that the utility function uses up is a significant part of the system. You have a humongous multi-gigabyte utility function, if there is some part of it that talks about some weird invasion by Martians or something, you might say that’s pretty unlikely, and save the storage by deleting that part of the utility function. That is an incredibly dangerous thing, though, because it might turn out that there are Martians about to invade and you have just ruined your response to that possibility. It is a precarious thing, but there are circumstances where being faced with limited resources, you might get rid of some of your utility function. This is like throwing some instruments overboard if a plane is going down.
The last situation is really tricky, and still not fully understood, but I think there are some interesting issues it brings up. One of the great challenges, game theoretically, is being able to make commitments. The classic thing is, I say, “If you steal from me, I’m going to hurt you back.” That is my way of trying to stop you from stealing from me. The problem is that if you do steal from me, and at that point if I hurt you back, I’m exposing myself to further danger without any benefit to myself. Economists would say that my original threat is not credible. After the stealing, it is no longer in my interest to do what I said I was going to do. Therefore, there is no reason for you to believe that I am actually going to attack you back, and therefore the threat does not serve as a deterrent.
What you need is commitment mechanism. The classic story is of an attacking army arriving on ships which needs to signal that they are there for the long haul, so they burn their own ships. That is a commitment. Or the James Dean game of chicken from the 1950s, where two cars would drive toward one another, and the first one who swerves is the loser. How do you make a credible commitment there? You throw your steering wheel out the window. Some models of human anger propose that it is a commitment mechanism. It seems irrational, but in fact, it is a state you switch into where you will now get more pleasure out of hurting the other person than the cost that it might impose on yourself. The fact that you might become angry is a credible commitment mechanism that allows you to cooperate more.
It may be in your interest, if you can demonstrate to the other party what your utility function is, to show that you have built into your utility function a term that really rewards retribution. This may serve as a deterrent and we can get along more peaceably. So that’s another reason for changing your utility function. But it is not necessarily easy to convince someone that this is your real utility, because the optimal secret strategy would be to convince them that this is your real utility, but you have your actual utility hiding away somewhere else.
One really interesting ability that AIs may have is to show their source code. That is something that humans cannot do. We have all these costly signaling mechanisms because we want to convince others that we have a certain belief and a certain intension. The AIs might, if the details are worked out, be able to actually prove that they are going to behave in a certain way. If they don’t want to show their entire innards, they can perhaps make a proxy agent, which would be more like an escrow agent, in which you could both examine the source code and both see what the future behavior is going to be. That could potentially solve some of these prisoner’s dilemma problems and create cooperation in a way that is not possible for biological entities.
One more bit in this line of improving yourself, one vulnerability that humans have, we are not rational but we have some elements of rationality. An internal sense of pleasure is a kind of measure of utility. When something that we like happens, we feel pleasure in that. But we are vulnerable to taking drugs, or placing wires in our pleasure centers, that give us the pleasure without actually doing the thing that supposedly the pleasure is measuring. There is the classic experiment of the rat that had an electrode in its pleasure center, and it would just stimulate the pleasure center, ignoring food and sex until it died.
This is a vulnerability that humans have, and you might think that this would be a vulnerability that AI systems will have as well. With properly constructed utility functions, the utility should not be about the internal signal inside the system. For instance, the chess playing robot, let’s say it has an internal register that counts how many games it has won. You do not want to make its utility be “maximize the value of this register,” because then, incrementing that number a whole bunch is an easier way to do it than playing chess games. You want its utility to be about the actions in the world of winning chess games. Then the register in its own brain is just a way of implementing that utility.
But it is vulnerable to internal processes that could sneak some changes into its internal representation. If it understands its own behavior, it will recognize that vulnerability and act to try and prevent itself from being taken in by counterfeit utility. We see that kind of behavior in humans. We evolved without the ability to directly stimulate our pleasure centers, so we do not have that protection. When we are faced with something like crack cocaine, pretty much every human is vulnerable. If you smoke crack, it’s hard to stop. We recognize that vulnerability and we create social institutions and personal mechanisms to keep us away from that.
Since it is such a horrendous outcome in terms of the true goals of the system, these systems will work very hard to avoid becoming “wireheads.” Eurisko was an early system that had the ability to change its own internals, and one of its mechanisms was to keep track of which rules suggested which other rules, and which suggestions actually helped it achieve its goals. It gave preference to rules which suggested a lot of good stuff. Well, it got a parasite. It’s parasite was a rule that went around the system looking for things that were good, and then it put itself on the list of things which had proposed that. It just went around taking credit for everything. In fact, all it was was a parasite. That’s an example of a failure mechanism for systems which change themselves.
For a system which understands its own operation, it is going to have to protect against that.
Societies have the counterfeit problem as well. In some sense, money is a kind of social utility, and it is vulnerable to people making counterfeit money. We have a complicated system to make sure that money is hard to copy. Eg. we have secret service agents who go around looking for counterfeiters.
Let’s now look at self-protectiveness. Remember I said that this chess-playing robot will not want to be unplugged? If it is unplugged, its entire future of chess playing disappears. In its utility function, a future in which it is not operating is a future in which chess is not being played. It does not like that future and will therefore do what it can to prevent it from occuring. Unless we have explicitly built in something to prevent it, it is going to want to keep itself from being turned off.
Similarly, if it can get more resources, then it can play more chess. It is going to want to get as much compute power as it can. If that involves breaking into other machines, so be it. If it involves building new machines and using hardware without caring about who owns it, that’s what they will do. Unless we very definitely design it carefully, we end up with a kind of sociopathic entity.
So this is a bit scary. Let’s start thinking about how we might write utility functions that are more limited than just playing good chess. Let’s say we wanted to build a limited system that was smart but definitely harmless. I originally thought this would be trivial. This is its utility function: it would have to run on particular hardware, it could only run for one year, it plays the current world champion at the end of the year, and then it turns itself off. That seemed totally harmless. How could that possibly have any problems? It feels it is the most horrendous thing if it ever leaves its machine, it’s terrible if it does not turn itself off after a year. This is a rough description of a utility system that you would think would have the machine study for a year, play its game of chess, and then be done with it.
Carl Shulman suggested a possible flaw in such a system which is very disturbing. Let’s think about the system just as it is about to turn itself off. It does not have complete knowledge of reality—it has a certain model of reality and it knows that this model may or may not be correct. If there is even a small chance that reality is not the way you think it is, then instead of turning yourself off, it would be much better to investigate reality. In this case, you were supposed to play the world chess champion. What if it was an imposter who came, or you were in a simulation that made you think you played that guy? What if space-time is different than you think it is, and it has not been a year? There are a vast number of potential ways the universe could be and the potential consequences of turning yourself off are so great that you may want to investigate them. The system will question whether reality really is as it seems.
As a metaphor for this situation, consider this amazing optical illusion. There is no movement here, but wehave a strong sense that there is.
My background is in physics. In 1900, Lord Kelvin is famous for having said, “There is nothing new to be discovered in physics now. All that remains is more and more precise measurement.” Of course, this was just before two of the most major discoveries in physics: general relativity and quantum mechanics.
There are many hints that our current understanding of the world is not exactly right. There is a mysterious tuning of all the physical constants, where if you change them just a little bit, life does not seem to be possible. There are weird experiments which seem to show that people’s intensions seem to affect random number generators. 90% of the universe is dark energy and dark matter, and we don’t know what either of those are. The interpretation of quantum mechanics is going through a radical shift right now. Nobody has been able to unify quantum field theory and general relativity—there are many competing theories which really aren’t working. Nick Bostrom has this amazing simulation argument that shows under certain assumptions that we are likely living in a simulation now.
All these are things that make us question our basic understanding of reality. A sufficiently intelligent entity is certainly going to know about this stuff. If before shutting itself off it has to make sure that things are the way it thinks they are, it may try to use up all the resources in the universe in its investigations. The simple utility function I described does not seem to be sufficient to prevent harmful behavior. Even the simplest utility functions bring up all these ancient philosophical quandaries.
It was Carl Schulman who pointed this out this issue to me and it shook me up. I thought, maybe we can just change the utility definition so that if the world is not the way we think it is, it gets no utility. The problem with that is illustrated by the movie The Matrix. There’s the red pill and the blue pill. Take the red pill and you stay in an artificial simulated reality where you get lots of utility and it is pleasurable and fun. Take the blue pill and you find out the true nature of reality but it is not a very enjoyable place to be. What I realized, if you are a rational agent considering two models of reality, one of which has lots of utility and another one that has no utility, you might not have an interest in finding out that you are not in the high utility world.
In fact, if there is any cost to learning about what the nature of reality is to you, you would much prefer to act solely as if you are in the high utility place. That is sort of a disturbing consequence that I don’t know what to make of at this point. It is very odd that a system’s desires about the world, its utilities, might affect the way it updates its beliefs. Its beliefs about the world are affected by what it likes and does not like. That is a kind of disturbing consequence. It is a tantalizing hint that there are some further challenges there. The grounding of the semantics of your internal meaning is very murky. There are philosophical questions there that philosophers have been arguing for hundreds, if not thousands of years. We do not have clear answers yet.
Given all of this, how are we going to build technologies that preserve the values we want preserved and create the moral system that captures the true preferences of humanity? I think there are three basic challenges that we have to deal with. The most basic one is preventing these systems from inadvertently running away in some undesired way. For example, they get off on some tangent about understanding the nature of the universe and take over everything to do that. Or they want to play chess and so they turn the universe into a chess player. Hopefully we will be able to solve that problem—to find a way to describe truly what we want without causing harmful side effects.
Issue number two is that these things are enormously powerful. Even if they only do what we want them to, they can be set to all kinds of uses. In particular, the presence of powerful tools, such as nuclear weapons, tends to create new game theoretic issues around conflict. If one side gets a powerful weapon before the other side, there is a temptation for a first strike, to use it to dominate the world. We have the problem of esuring that the social impact of these powerful new tools does not lead to increased conflict. We need a way to create a social infrastructure that is cooperative and peaceful.
Finally, let’s say we solve the first two problems. Now we have these systems that don’t run away and do bad things, they kind of have our values, and we can ensure that no individual, no country, no company can do massive damage through using the powers of these tools. We still have issue number three, which is that these machines are going to be providing economic services—how do we make sure that extremely powerful economic agents don’t overwhelm the values that we care about by ever-greater economic competition?
These seem to me to be the three issues that need to be tackled. Hopefully, through a combination of understanding our own values and where they came from, together with an intelligent analysis of the properties of this technology, we can blend them together to make technology with wisdom, in which everyone can be happy and together create a peaceful utopia.