Editing Artificial intelligence (section)

=== Planning and decision-making ===
An "agent" is anything that perceives and takes actions in the world. A [[rational agent]] has goals or preferences and takes actions to make them happen.{{Efn|
"Rational agent" is general term used in [[economics]], [[philosophy]] and theoretical artificial intelligence. It can refer to anything that directs its behavior to accomplish goals, such as a person, an animal, a corporation, a nation, or in the case of AI, a computer program.
}}{{Sfnp|Russell|Norvig|2021|p=528}} In [[automated planning]], the agent has a specific goal.<ref>[[Automated planning]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 11}}.</ref> In [[automated decision-making]], the agent has preferences—there are some situations it would prefer to be in, and some situations it is trying to avoid. The decision-making agent assigns a number to each situation (called the "[[utility]]") that measures how much the agent prefers it. For each possible action, it can calculate the "[[expected utility]]": the [[utility]] of all possible outcomes of the action, weighted by the probability that the outcome will occur. It can then choose the action with the maximum expected utility.<ref>[[Automated decision making]], [[Decision theory]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 16–18}}.</ref>

In [[Automated planning and scheduling#classical planning|classical planning]], the agent knows exactly what the effect of any action will be.<ref>[[Automated planning and scheduling#classical planning|Classical planning]]: {{Harvtxt|Russell|Norvig|2021|loc=Section 11.2}}.</ref> In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked.<ref>Sensorless or "conformant" planning, contingent planning, replanning (a.k.a online planning): {{Harvtxt|Russell|Norvig|2021|loc=Section 11.5}}.</ref>

In some problems, the agent's preferences may be uncertain, especially if there are other agents or humans involved. These can be learned (e.g., with [[inverse reinforcement learning]]), or the agent can seek information to improve its preferences.<ref>Uncertain preferences: {{Harvtxt|Russell|Norvig|2021|loc=Section 16.7}}
[[Inverse reinforcement learning]]: {{Harvtxt|Russell|Norvig|2021|loc=Section 22.6}}</ref> [[Information value theory]] can be used to weigh the value of exploratory or experimental actions.<ref>[[Information value theory]]: {{Harvtxt|Russell|Norvig|2021|loc=Section 16.6}}.</ref> The space of possible future actions and situations is typically [[intractably]] large, so the agents must take actions and evaluate situations while being uncertain of what the outcome will be.

A [[Markov decision process]] has a [[Finite-state machine|transition model]] that describes the probability that a particular action will change the state in a particular way and a [[reward function]] that supplies the utility of each state and the cost of each action. A [[Reinforcement learning#Policy|policy]] associates a decision with each possible state. The policy could be calculated (e.g., by [[policy iteration|iteration]]), be [[heuristic]], or it can be learned.<ref>[[Markov decision process]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 17}}.</ref>

[[Game theory]] describes the rational behavior of multiple interacting agents and is used in AI programs that make decisions that involve other agents.<ref>[[Game theory]] and multi-agent decision theory: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 18}}.</ref>