Autonomous systems will be approximately rational
How should autonomous systems be designed? Imagine yourself as the designer of the Israeli Iron Dome system. Mistakes in the design of a missile defense system could cost many lives and the destruction of property. The designers of this kind of system are strongly motivated to optimize the system to the best of their abilities. But what should they optimize?
The Israeli Iron Dome missile defense system consists of three subsystems. The detection and tracking radar system is built by Elta, the missile firing unit and Tamir interceptor missiles are built by Rafael, and the battle management and weapon control system is built by mPrest Systems. Consider the design of the weapon control system.
At first, a goal like “Prevent incoming missiles from causing harm” might seem to suffice. But the interception is not perfect, so probabilities of failure must be included. And each interception requires two Tamir interceptor missiles which cost $50,000 each. The offensive missiles being shot down are often very low tech, costing only a few hundred dollars, and with very poor accuracy. If an offensive missile is likely to land harmlessly in a field, it’s not worth the expense to target it. The weapon control system must balance the expected cost of the harm against the expected cost of interception.
Economists have shown that the trade-offs involved in this kind of calculation can be represented by defining a real-valued “utility function” which measures the desirability of an outcome. They show that it can be chosen so that in uncertain situations, the expectation of the utility should be maximized. The economic framework naturally extends to the complexities that arms races inevitably create. For example, the missile control system must decide how to deal with multiple incoming missiles. It must decide which missiles to target and which to ignore. A large economics literature shows that if an agent’s choices cannot be modeled by a utility function, then the agent must sometimes behave inconsistently. For important tasks, designers will be strongly motivated to build self-consistent systems and therefore to have them act to maximize an expected utility.
Economists call this kind of action “rational economic behavior”. There is a growing literature exploring situations where humans do not naturally behave in this way and instead act irrationally. But the designer of a missile-defense system will want to approximate rational economic behavior as closely as possible because lives are at stake. Economists have extended the theory of rationality to systems where the uncertainties are not known in advance. In this case, rational systems will behave as if they have a prior probability distribution which they use to learn the environmental uncertainties using Bayesian statistics.
Modern artificial intelligence research has adopted this rational paradigm. For example, the leading AI textbook uses it as a unifying principle and an influential theoretical AI model is based on it as well. For definiteness, we briefly review one formal version of optimal rational decision making. At each discrete time step , the system receives a sensory input and then generates an action . The utility function is defined over sensation sequences as and the prior probability distribution is the prior probability of receiving a sensation sequence when taking actions . The rational action at time is then:
This may be viewed as the formula for intelligent action and includes Bayesian inference, search, and deliberation. There are subtleties involved in defining this model when the system can sense and modify its own structure but it captures the essence of rational action.
Unfortunately, the optimal rational action is very expensive to compute. If there are sense states and action states, then a straightforward computation of the optimal action requires computational steps. For most environments, this is too expensive and so rational action must be approximated.
To understand the effects of computational limitations, this paper defined “rationally shaped” systems which optimally approximate the fully rational action given their computational resources. As computational resources are increased, systems’ architectures naturally progress from stimulus-response, to simple learning, to episodic memory, to deliberation, to meta-reasoning, to self-improvement, to full rationality. We found that if systems are sufficiently powerful, they still exhibit all of the problematic drives described in another link. Weaker systems may not initially be able to fully act on their motivations but they will be driven increase their resources and improve themselves until they can act on them. We therefore need to ensure that autonomous systems don’t have harmful motivations even if they are not currently capable of acting on them.