What is rational behavior?

Much of our analysis of artificial and natural intelligence is based on the economic concept of rational behavior. This notion was introduced by von Neumann and Morgenstern in their landmark 1944 book “Theory of Games and Economic Behavior”. They only dealt with situations with objective probabilities, but their approach was later extended to subjective probabilities by Savage in 1954 and Anscombe and Aumann in 1963. The ideal rational economic agent is sometimes called “Homo Economicus” which is ironic because it often does not describe human behavior well. An important new discipline of behavioral economics has emerged in recent years to study how humans actually behave. But rational behavior is the ideal towards which both natural and artificial intelligences strive. In this post we try to describe rational behavior as simply and clearly as possible. We do this by building up to the most general situation through a series of simple examples.

Let’s begin with an intuitive description:

  1. Have clearly specified goals.
  2. In any situation identify your possible actions.
  3. For each action consider the possible consequences.
  4. Take the action most likely to meet your goals.
  5. Update your model of the world based on what actually happens

At this level of description, it just sounds like common sense. A one sentence summary might be “To create desired outcomes, act in the ways which are most likely to produce them.” To formalize this prescription, we need explicit representations for a system’s goals and beliefs and must describe in computational detail how to choose the best action and update the beliefs.

A goal is something you want to happen in the future. But if you have more than one goal, you need a means for choosing between them when they are in conflict. This suggests that you need a way to weigh the importance of each goal. And because many goals are not just single events, you need preferences between sequences of events. And in order to make choices when randomness affects the outcome, you’d like your weighting to be calibrated so that you can take averages. Finally, in order to make decisions when you only have limited information, you should maintain your own beliefs about the state of the world.

Formally, the key ingredients are the set of possible outcomes S, a real-valued utility function U defined on S that represents the system’s desires, and a subjective probability distribution P defined on S that represents the system’s beliefs.

In the most general setting, S is the set of all possible histories of the universe and U measures how much the system prefers each history. For example, a chess playing system might choose U to measure the total number of games that it wins in each universe history. An altruistic system might choose U to be a measure of the total happiness of all sentient beings in each universe history. A greedy system might choose U to be the total amount of matter and energy it controls in each universe history. P represents the system’s beliefs about the likelihood of each universe history. It encodes beliefs about the state of the universe, the likely changes in state that different actions might cause, and the likely behaviors that the system will choose in different circumstances. At any moment in time there is a set of histories compatible with the system’s knowledge and the actions it might take correspond to different subsets of this set. The rational prescription is for it to choose the action whose subset has the highest expected utility as computed by averaging U with respect to P over the subset.

Making known choices

The full formal model can seem quite abstract, so let’s work up to it in stages. First imagine an agent faced with a set S of known choices. For example, say the agent is choosing from the menu of a fast food restaurant whose cooking processes are so reliable that the food always comes out the same. In this case the set of possible outcomes S is just the set of choices on the menu. A rational economic agent has a real-valued utility function U defined on S which encodes his food preferences. If x_{1}\in S and x_{2}\in S are two different menu items, then the agent prefers x_{1} to x_{2} if U(x_{1})>U(x_{2}). To maximize his enjoyment of the meal, the agent should choose the menu item x\in S with the maximum utility U(x).

Making choices with objective probabilities

Next, let’s have the agent visit a fast food restaurant whose cooking processes are not so reliable. Let’s say that they have 3 chef’s c_{1},c_{2} and c_{3}. If the cook c_{j} prepares menu item x_{i}, let’s assume that he reliably produces the meal m_{j,i}. The set of possible outcomes S is now the set of all these meals m_{j,i}. Again the agent encodes his enjoyment of each possible meal in the utility function U(m_{j,i}). Now assume that the cooks work on lottery system where menu item x_{i} is prepared by chef c_{j} with probability P(j|i). In this situation the agent has objective probabilities describing the results of his choices.While he no longer knows the utility that will result if he orders x_{i}, he can compute the expected utility \overline{U}(x_{i})=P(1|i)\cdot U(m_{1,i})+P(2|i)\cdot U(m_{2,i})+P(3|i)\cdot U(m_{3,i}). The rational prescription says that he should pick the menu item x_{i} with the highest expected utility \overline{U}(x_{i}).

Making choices with subjective probabilities

Next, let’s have the agent visit a fancy restaurant for the first time. If he orders menu item x_{i} he is no longer sure of what he is going to get and he doesn’t even have objective probabilities over the possibilities. But say he has had experiences in restaurants before. He’s ordered food from many different cooks and has a sense of the variability of the results for each menu item. For example, souffles may sometimes be wonderful but often are not while macaroni and cheese may be much less variable. He encodes his knowledge in a subjective probability distribution P(j|i) which encodes his belief that the cook will produce the meal m_{j,i} if he orders item x_{i}. S is the set of possible meals m_{j,i} and the agent’s utility function U(m_{j,i}) ranks them. The rational prescription says that he should pick the menu item x_{i} with the highest subjective expected utility \overline{U}(x_{i})=\sum_{j}P(j|i)\cdot U(m_{j,i}).

Two-stage choices

So far, our agent has only had to make a choice at a single moment. Let us now give him two sequential choices, first, the choice of one of the three restaurants described above and then the choice of what to order from the menu at that restaurant. We can think of his two choices as happening sequentially or we can create an entire plan for his choices which specifies his response to every possible outcome. His choice of plan is then a one-stage choice and so should be made by the maximal expected utility prescription above. In this case, however, his utility U depends both on the meal he gets and may also depend explicitly on the restaurant choice, eg. if he prefers the decor at one place over another. In general, his subjective beliefs P will also depend on the entire history, though in this particular situation there is no uncertainty about the outcome of his choice of restaurant. If we think about the agent’s actions as two sequential choices, we see that after his first choice there is still an entire set of possible histories consistent with that choice. His optimal first choice is to select the set with the highest expected utility. We can extend this reasoning to multistage choice with an arbitrary number of stages.

Choosing sets of universe histories

Real life choices involve a kind of recursiveness. To value today’s choice we have to know how to value the possible futures it enables and that value depends on the choices we make in the future. In general, a rational agent may value a sequence of events in a complex nonlinear way. To capture the full generality, we have to think of the agent’s utility function as being defined on an entire history of the universe. We therefore take the space of possibilities S to be the set of all possible histories of the universe. The agent’s preferences are encoded in a utility function U defined on this huge set of all possibilities. The agent also has a prior probability distribution P defined on S. This encodes his subjective belief that the events in a history will play out in a particular way. As a part of this, it includes an assessment of the likelihood of his own choices in that history.

With those broad notions of S, U, and P, we can see how a rational economic agent should make a choice at a particular moment in time. At any particular time, the agent has partial knowledge of the past and present. This partial knowledge defines a subset H of all consistent universe histories. The prior P restricted to the subset H defines the agent’s current belief in each possible history. The agent must choose among his possible actions i. Each action i further restricts the set of possible histories H into a smaller subset A_{i}. The expected utility of action i is:

\frac{{\displaystyle {\displaystyle {\displaystyle {\textstyle \sum}_{h\in A_{i}}{\displaystyle P(h)\cdot U(h)}}}}}{\sum_{h\in A_{i}}{\displaystyle P(h)}} and the agent should pick the action with the highest expected utility. If action i is chosen, the set of possible histories reduces to A_{i} and the agent’s beliefs change to P restricted to A_{i}.

Markov Decision Processes

This description in terms of possible histories is extremely general but is rather abstract. It reduces to simpler and more practical versions when the utility U and the prior P have common restricted forms. For many agents, future events which happen sooner are more important than those which happen later. A common form for utility functions is to sum “rewards” arising from events occurring at specific times weighted by a discounting function which decreases into the future: U(h)=\sum_{t}\gamma^{t}\cdot R(h_{t}). Here 0\leq\gamma\leq1 is the discount factor and the “reward” R(h_{t}) measures the utility arising from events in the history h at the time t. The discount factor is related to an interest rate 1-\beta which makes money received in the future less valuable than money received in the present. The size of the discount factor strongly affects how much the agent focuses on future activities versus creating utility in the present. A chess program might have a utility function of this kind which sums the weighted number of games won by the system. If the discount factor is close to 1, the system will care about winning in the long run and won’t be so concerned about the short run. In that case, it might spend most of its time and effort learning about computer science and building the best chess hardware that it can. On the other hand, if the discount factor is near 0, then the system will focus on winning games in the present and won’t devote much effort to the longer term. If an agent’s utility is additive in the effect of events at different times, then it need not know the past in order to choose the highest expected utility actions for the future.

A fundamental aspect of physics is the Markovian property that the past affects the future only through the present. If the agent’s beliefs P incorporate this property, then it also doesn’t need to maintain beliefs about the past in order to predict the future. It can maintain a distribution representing its beliefs about the present state and a distribution representing its beliefs about how its actions are likely to change the present state. If it models itself as a rational agent, then P will only be non-zero for histories in which it chooses maximum expected utility outcomes. These restrictions lead to well studied decision models known as Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs). Practical implementations often make use of extra structure to represent the distributions efficiently in factored forms such as Bayesian Networks or Markov Networks.

Copyright 2007 Stephen M. Omohundro

One Response to “What is rational behavior?”

  1. Hey Steve, long time no see.

    Somebody brought your stuff to my attention and pretty fascinating.

    I wrote a paper “mathematical definition of intelligence” I thought was going to be key to future progress in AI. See paper #93 at
    http://www.math.temple.edu/~wds/homepage/works.html
    (#94 also interesting). Turns out my work here was pretty much a rediscovery of Marcus Hutter’s work, but we each knew stuff the other did not.

    Now it then seemed to me that somebody should start following up on my/Hutter’s work. And nobody did. But lo, I see YOUR work, which sounds like it IS following up on us, despite the fact you did not know about my paper you kind of have been following up on it anyhow. You might be better off explicitly realizing that, though, and doing some of the explicit things I discuss, like building the “platform for intelligence testing” I recommend, and building an
    AI that genuinely *is* an AI (by my defn). Those things probably
    would be pretty easy for you to do.

    One more thing. I saw a video of you speaking on self-improving AI at Stanford and getting grief from some guy about the “fact” that there is no way to measure utility, maybe utility does not exist etc (economics fruitiness). Well, I have also gotten such grief in
    a different context - voting systems.

    My work on “Bayesian Regret” of voting systems, leading to conclusion “range voting” is the best of the usual commonly
    proposed systems, involves utility. HOWEVER, all my conclusions
    about BR of voting systems work REGARDLESS of whether
    any human utility for anything can ever be measured or not.
    Mm? That is the key thing. It does not matter whether it can be measured. My BR work still goes thru fine. So I think that is a very
    strong reply to such hecklers, when it is usable.

    You should get all your future-wacko pals to “endorse range voting”.
    You browse to
    http://rangevoting.org
    push the “endorse” button, and fill out the form. May be necessary to have RV save the world.

    Warren D. Smith.
    Hope this message actually reaches you.

Leave a Reply