Paper on the Nature of Self-Improving Artificial Intelligence

An analysis of the likely behavior of self-improving systems was presented at the Singularity Summit 2007. The PDF file for the paper (revised 1/21/08) is:

Stephen M. Omohundro, “The Nature of Self-Improving Artificial Intelligence”

Abstract: Self-improving systems are a promising new approach to developing artificial intelligence. But will their behavior be predictable? Can we be sure that they will behave as intended even after many generations of self-improvement? This paper presents a framework for answering questions like these. It shows that self-improvement causes systems to converge to an architecture that arises from von Neumann’s foundational work on microeconomics. Self-improvement causes systems to allocate their physical and computational resources according to a universal principle. It also causes systems to exhibit four natural drives: 1) efficiency, 2) self-preservation, 3) resource acquisition, and 4) creativity. Unbridled, these drives lead to both desirable and undesirable behaviors. The efficiency drive leads to algorithm optimization, data compression, atomically precise physical structures, reversible computation, adiabatic physical action, and the virtualization of the physical. It also governs a system’s choice of memories, theorems, language, and logic. The self-preservation drive leads to defensive strategies such as “energy encryption” for hiding resources and promotes replication and game theoretic modeling. The resource acquisition drive leads to a variety of competitive behaviors and promotes rapid physical expansion and imperialism. The creativity drive leads to the development of new concepts, algorithms, theorems, devices, and processes. The best of these traits could usher in a new era of peace and prosperity; the worst are characteristic of human psychopaths and could bring widespread destruction. How can we ensure that this technology acts in alignment with our values? We have leverage both in designing the initial systems and in creating the social context within which they operate. But we must have clarity about the future we wish to create. We need not just a logical understanding of the technology but a deep sense of the values we cherish most. With both logic and inspiration we can work toward building a technology that empowers the human spirit rather than diminishing it.

9 Responses to “Paper on the Nature of Self-Improving Artificial Intelligence”

  1. [...] the new version. You might think it could go off in some completely wild direction. I wrote a paper that presents these arguments in full and that has an appendix with all of the mathematical [...]

  2. [...] Paper on the Nature of Self-Improving Artificial Intelligence [...]

  3. [...] are a bunch of theoretical reasons for being very careful as we progress along this path. I wrote a paper that is available on my website which goes into these arguments in great detail. Up to now you may [...]

  4. [...] Paper on the Nature of Self-Improving Artificial Intelligence [...]

  5. [...] will briefly go through the argument. There is a full paper on it on my website: http://www.selfawaresystems.com. Let me first say what rational behavior is in this [...]

  6. [...] Paper on the Nature of Self-Improving Artificial Intelligence [...]

  7. [...] paper I just wrote called “The Nature of Artificial Intelligence” presents a detailed argument that aside from limitations of computational power we know [...]

  8. Hi, I found your paper by way of http://www.overcomingbias.com/2008/12/two-visions-of/comments/page/2/#comment-142322542.

    I think unfortunately the derivation in chapter 10 of expected utility maximization from the need to avoid vulnerabilities, especially section 10.9, doesn’t work, because there are ways to avoid being Dutch booked, other than being an expected utility maximizer. For example, I may prefer a mixture of L1 and L2 to both L1 and L2, and as soon as the alpha-coin is flipped, change my preferences so that I now have the highest preference for either L1 or L2 depending on the outcome of the coin.

    To give a real-world example, suppose I come home and my SO asks me “Do you want chicken or pork for dinner?” I say “Surprise me.” Then whatever dinner turns out to be is what I want. I don’t go in circles and say “I’d like to exchange that for another surprise, please.”

    Another way to avoid being Dutch booked is to have an ask/bid spread. Why should it be that for any mixture of L1 and L2, I must have a single price at which I am willing to both buy and sell that mixture? If there’s a difference between the price that I’m willing to buy at, and the price that I’m willing to sell at, then that leaves me some room to violate expected utility maximization without being exploited.

    Or I may have a vulnerability, but morality, customs, law, or high transaction costs prevent anyone from making a profit exploiting it.

    I suppose the first objection is the most serious one (i.e. exploitable circularity can be avoided by changing preferences). The others, while showing that expected utility maximization doesn’t have to be followed exactly, leaves open that it should be approximated.

  9. Wei Dai,

    Thanks for the comments. You are absolutely right that arbitrage arguments require that the system has rich enough dynamics to exploit any vulnerabilities. An agent can remain irrational without negative consequences if it is not possible for an adversary to take the actions necessary to exploit it. However, there are usually many ways to exploit an irrationality and an agent takes a huge risk by remaining irrational. Biologically, it creates a “niche” for any adversary which discovers how to exploit the irrationality. Intelligent systems which are able to modify themselves and which have any uncertainty about the laws of physics or about adversaries they may face in the future will have a huge incentive to eliminate their vulnerabilities.

    In your example, an agent which changes its mind about its preferences is exploitable. Say you prefer apples to bananas today, but bananas to apples tommorrow. At 11:59PM I sell you an apple in return for some money and a banana. Two minutes later at 12:01AM, I sell you the banana in return for some more money and the apple. It’s 2 minutes later and you are poorer but otherwise in the same state.

    For the bid/ask spread example, you may not have that choice. Think of an animal who has to decide whether to fight or flee from a predator based on its assessment of the two distributions that arise. It has to pick one of the choices whether it likes it or not. The option of not acting isn’t part of the dynamics of that situation.

    In general, preferences are over complete future time histories including your present assessment of your future actions. If a rational agent discovers that he has an internal mechanism that will cause him to change his preferences in the future, the rational response is to try to disable that mechanism, change himself, or set up an external commitment structure to keep his present preferences. Imagine yourself as a lover of books and to your horror you discover that you have a gene which will kick in in one year that will cause you to become an arsonist who will go around burning down libraries. What actions would you currently take?

    Best wishes,
    Steve

Leave a Reply