Editing Dynamic programming (section)

== Overview ==

=== Mathematical optimization ===
In terms of mathematical optimization, dynamic programming usually refers to simplifying a decision by breaking it down into a sequence of decision steps over time.

This is done by defining a sequence of '''value functions''' ''V''<sub>1</sub>, ''V''<sub>2</sub>, ..., ''V''<sub>''n''</sub> taking ''y'' as an argument representing the '''[[State variable|state]]''' of the system at times ''i'' from 1 to ''n''.

The definition of ''V''<sub>''n''</sub>(''y'') is the value obtained in state ''y'' at the last time ''n''.

The values ''V''<sub>''i''</sub> at earlier times ''i''&nbsp;=&nbsp;''n''&nbsp;&minus;1,&nbsp;''n''&nbsp;&minus;&nbsp;2,&nbsp;...,&nbsp;2,&nbsp;1 can be found by working backwards, using a [[Recursion|recursive]] relationship called the [[Bellman equation]].

For ''i''&nbsp;=&nbsp;2,&nbsp;...,&nbsp;''n'', ''V''<sub>''i''&minus;1</sub> at any state ''y'' is calculated from ''V''<sub>''i''</sub> by maximizing a simple function (usually the sum) of the gain from a decision at time ''i''&nbsp;&minus;&nbsp;1 and the function ''V''<sub>''i''</sub> at the new state of the system if this decision is made.

Since ''V''<sub>''i''</sub> has already been calculated for the needed states, the above operation yields ''V''<sub>''i''&minus;1</sub> for those states.

Finally, ''V''<sub>1</sub> at the initial state of the system is the value of the optimal solution. The optimal values of the decision variables can be recovered, one by one, by tracking back the calculations already performed.

=== Control theory ===
In [[control theory]], a typical problem is to find an admissible control <math>\mathbf{u}^{\ast}</math> which causes the system <math>\dot{\mathbf{x}}(t) = \mathbf{g} \left( \mathbf{x}(t), \mathbf{u}(t), t \right)</math> to follow an admissible trajectory <math>\mathbf{x}^{\ast}</math> on a continuous time interval <math>t_{0} \leq t \leq t_{1}</math> that minimizes a [[Loss function|cost function]]
:<math>J = b \left( \mathbf{x}(t_{1}), t_{1} \right) + \int_{t_{0}}^{t_{1}} f \left( \mathbf{x}(t), \mathbf{u}(t), t \right) \mathrm{d} t</math> 
The solution to this problem is an optimal control law or policy <math>\mathbf{u}^{\ast} = h(\mathbf{x}(t), t)</math>, which produces an optimal trajectory <math>\mathbf{x}^{\ast}</math> and a [[cost-to-go function]] <math>J^{\ast}</math>. The latter obeys the fundamental equation of dynamic programming:
:<math>- J_{t}^{\ast} = \min_{\mathbf{u}} \left\{ f \left( \mathbf{x}(t), \mathbf{u}(t), t \right) + J_{x}^{\ast \mathsf{T}} \mathbf{g} \left( \mathbf{x}(t), \mathbf{u}(t), t \right) \right\}</math>
a [[partial differential equation]] known as the [[Hamilton–Jacobi–Bellman equation]], in which <math>J_{x}^{\ast} = \frac{\partial J^{\ast}}{\partial \mathbf{x}} = \left[ \frac{\partial J^{\ast}}{\partial x_{1}} ~~~~ \frac{\partial J^{\ast}}{\partial x_{2}} ~~~~ \dots ~~~~  \frac{\partial J^{\ast}}{\partial x_{n}} \right]^{\mathsf{T}}</math> and <math>J_{t}^{\ast} = \frac{\partial J^{\ast}}{\partial t}</math>. One finds that minimizing <math>\mathbf{u}</math> in terms of <math>t</math>, <math>\mathbf{x}</math>, and the unknown function <math>J_{x}^{\ast}</math> and then substitutes the result into the Hamilton–Jacobi–Bellman equation to get the partial differential equation to be solved with boundary condition <math>J \left( t_{1} \right) = b \left( \mathbf{x}(t_{1}), t_{1} \right)</math>.<ref>{{cite book |first1=M. I. |last1=Kamien |author-link=Morton Kamien |first2=N. L. |last2=Schwartz |author-link2=Nancy Schwartz |title=Dynamic Optimization: The Calculus of Variations and Optimal Control in Economics and Management |location=New York |publisher=Elsevier |edition=Second |year=1991 |isbn=978-0-444-01609-6 |url=https://books.google.com/books?id=0IoGUn8wjDQC&pg=PA261 |page=261 }}</ref> In practice, this generally requires [[Numerical partial differential equations|numerical techniques]] for some discrete approximation to the exact optimization relationship.

Alternatively, the continuous process can be approximated by a discrete system, which leads to a following recurrence relation analog to the Hamilton–Jacobi–Bellman equation:
:<math>J_{k}^{\ast} \left( \mathbf{x}_{n-k} \right) = \min_{\mathbf{u}_{n-k}} \left\{ \hat{f} \left( \mathbf{x}_{n-k}, \mathbf{u}_{n-k} \right) + J_{k-1}^{\ast} \left( \hat{\mathbf{g}} \left( \mathbf{x}_{n-k}, \mathbf{u}_{n-k} \right) \right) \right\}</math>
at the <math>k</math>-th stage of <math>n</math> equally spaced discrete time intervals, and where <math>\hat{f}</math> and <math>\hat{\mathbf{g}}</math> denote discrete approximations to <math>f</math> and <math>\mathbf{g}</math>. This functional equation is known as the [[Bellman equation]], which can be solved for an exact solution of the discrete approximation of the optimization equation.<ref>{{cite book |first=Donald E. |last=Kirk |title=Optimal Control Theory: An Introduction |location=Englewood Cliffs, NJ |publisher=Prentice-Hall |year=1970 |isbn=978-0-13-638098-6 |pages=94–95 |url=https://books.google.com/books?id=fCh2SAtWIdwC&pg=PA94 }}</ref>

==== Example from economics: Ramsey's problem of optimal saving ====
{{See also|Ramsey–Cass–Koopmans model}}
In [[economics]], the objective is generally to maximize (rather than minimize) some dynamic [[social welfare function]]. In Ramsey's problem, this function relates amounts of consumption to levels of [[utility]]. Loosely speaking, the planner faces the trade-off between contemporaneous consumption and future consumption (via investment in [[Physical capital|capital stock]] that is used in production), known as [[intertemporal choice]]. Future consumption is discounted at a constant rate <math>\beta \in (0,1)</math>. A discrete approximation to the transition equation of capital is given by
:<math>k_{t+1} = \hat{g} \left( k_{t}, c_{t} \right) = f(k_{t}) - c_{t}</math>
where <math>c</math> is consumption, <math>k</math> is capital, and <math>f</math> is a [[production function]] satisfying the [[Inada conditions]]. An initial capital stock <math>k_{0} > 0</math> is assumed.

Let <math>c_t</math> be consumption in period {{mvar|t}}, and assume consumption yields [[utility]] <math>u(c_t)=\ln(c_t)</math> as long as the consumer lives. Assume the consumer is impatient, so that he [[discounting|discounts]] future utility by a factor {{mvar|b}} each period, where <math>0<b<1</math>. Let <math>k_t</math> be [[capital (economics)|capital]] in period {{mvar|t}}. Assume initial capital is a given amount <math>k_0>0</math>, and suppose that this period's capital and consumption determine next period's capital as <math>k_{t+1}=Ak^a_t - c_t</math>, where {{mvar|A}} is a positive constant and <math>0<a<1</math>. Assume capital cannot be negative. Then the consumer's decision problem can be written as follows:

: <math>\max \sum_{t=0}^T b^t \ln(c_t)</math> subject to <math>k_{t+1}=Ak^a_t - c_t \geq 0</math> for all <math>t=0,1,2,\ldots,T</math>

Written this way, the problem looks complicated, because it involves solving for all the choice variables <math>c_0, c_1, c_2, \ldots , c_T</math>. (The capital <math>k_0</math> is not a choice variable—the consumer's initial capital is taken as given.)

The dynamic programming approach to solve this problem involves breaking it apart into a sequence of smaller decisions. To do so, we define a sequence of ''value functions'' <math>V_t(k)</math>, for <math>t=0,1,2,\ldots,T,T+1</math> which represent the value of having any amount of capital {{mvar|k}} at each time {{mvar|t}}. There is (by assumption) no utility from having capital after death, <math>V_{T+1}(k)=0</math>.

The value of any quantity of capital at any previous time can be calculated by [[backward induction]] using the [[Bellman equation]]. In this problem, for each <math>t=0,1,2,\ldots,T</math>, the Bellman equation is

: <math>V_t(k_t) \, = \, \max \left( \ln(c_t) + b V_{t+1}(k_{t+1}) \right)</math> subject to <math>k_{t+1}=Ak^a_t - c_t \geq 0</math>

This problem is much simpler than the one we wrote down before, because it involves only two decision variables, <math>c_t</math> and <math>k_{t+1}</math>. Intuitively, instead of choosing his whole lifetime plan at birth, the consumer can take things one step at a time. At time {{mvar|t}}, his current capital <math>k_t</math> is given, and he only needs to choose current consumption <math>c_t</math> and saving <math>k_{t+1}</math>.

To actually solve this problem, we work backwards. For simplicity, the current level of capital is denoted as {{mvar|k}}. <math>V_{T+1}(k)</math> is already known, so using the Bellman equation once we can calculate <math>V_T(k)</math>, and so on until we get to <math>V_0(k)</math>, which is the ''value'' of the initial decision problem for the whole lifetime. In other words, once we know <math>V_{T-j+1}(k)</math>, we can calculate <math>V_{T-j}(k)</math>, which is the maximum of <math>\ln(c_{T-j}) + b V_{T-j+1}(Ak^a-c_{T-j})</math>, where <math>c_{T-j}</math> is the choice variable and <math>Ak^a-c_{T-j} \ge 0</math>.

Working backwards, it can be shown that the value function at time <math>t=T-j</math> is

: <math>V_{T-j}(k) \, = \, a \sum_{i=0}^j a^ib^i \ln k + v_{T-j}</math>

where each <math>v_{T-j}</math> is a constant, and the optimal amount to consume at time <math>t=T-j</math> is

: <math>c_{T-j}(k) \, = \, \frac{1}{\sum_{i=0}^j a^ib^i} Ak^a</math>

which can be simplified to

: <math>\begin{align}
c_{T}(k) & = Ak^a\\
c_{T-1}(k) & = \frac{Ak^a}{1+ab}\\
c_{T-2}(k) & = \frac{Ak^a}{1+ab+a^2b^2}\\
&\dots\\
c_2(k) & = \frac{Ak^a}{1+ab+a^2b^2+\ldots+a^{T-2}b^{T-2}}\\
c_1(k) & = \frac{Ak^a}{1+ab+a^2b^2+\ldots+a^{T-2}b^{T-2}+a^{T-1}b^{T-1}}\\
c_0(k) & = \frac{Ak^a}{1+ab+a^2b^2+\ldots+a^{T-2}b^{T-2}+a^{T-1}b^{T-1}+a^Tb^T}
\end{align}</math>

We see that it is optimal to consume a larger fraction of current wealth as one gets older, finally consuming all remaining wealth in period {{mvar|T}}, the last period of life.

=== Computer science ===
There are two key attributes that a problem must have in order for dynamic programming to be applicable: [[optimal substructure]] and [[overlapping subproblem|overlapping sub-problem]]s. If a problem can be solved by combining optimal solutions to ''non-overlapping'' sub-problems, the strategy is called "[[Divide and conquer algorithm|divide and conquer]]" instead.<ref name=":0" />  This is why [[mergesort|merge sort]] and [[quicksort|quick sort]] are not classified as dynamic programming problems.

''Optimal substructure'' means that the solution to a given optimization problem can be obtained by the combination of optimal solutions to its sub-problems. Such optimal substructures are usually described by means of [[recursion]]. For example, given a graph ''G=(V,E)'', the shortest path ''p'' from a vertex ''u'' to a vertex ''v'' exhibits optimal substructure: take any intermediate vertex ''w'' on this shortest path ''p''. If ''p'' is truly the shortest path, then it can be split into sub-paths ''p<sub>1</sub>'' from ''u'' to ''w'' and ''p<sub>2</sub>'' from ''w'' to ''v'' such that these, in turn, are indeed the shortest paths between the corresponding vertices (by the simple cut-and-paste argument described in ''[[Introduction to Algorithms]]''). Hence, one can easily formulate the solution for finding shortest paths in a recursive manner, which is what the [[Bellman–Ford algorithm]] or the [[Floyd–Warshall algorithm]] does.

''Overlapping'' sub-problems means that the space of sub-problems must be small, that is, any recursive algorithm solving the problem should solve the same sub-problems over and over, rather than generating new sub-problems. For example, consider the recursive formulation for generating the Fibonacci sequence: ''F''<sub>''i''</sub> = ''F''<sub>''i''&minus;1</sub> + ''F''<sub>''i''&minus;2</sub>, with base case ''F''<sub>1</sub>&nbsp;=&nbsp;''F''<sub>2</sub>&nbsp;=&nbsp;1. Then ''F''<sub>43</sub> =&nbsp;''F''<sub>42</sub>&nbsp;+&nbsp;''F''<sub>41</sub>, and ''F''<sub>42</sub> =&nbsp;''F''<sub>41</sub>&nbsp;+&nbsp;''F''<sub>40</sub>. Now ''F''<sub>41</sub> is being solved in the recursive sub-trees of both ''F''<sub>43</sub> as well as ''F''<sub>42</sub>. Even though the total number of sub-problems is actually small (only 43 of them), we end up solving the same problems over and over if we adopt a naive recursive solution such as this. Dynamic programming takes account of this fact and solves each sub-problem only once.

[[Image:Fibonacci dynamic programming.svg|thumb|108px|'''Figure 2.''' The subproblem graph for the Fibonacci sequence. The fact that it is not a [[tree structure|tree]] indicates overlapping subproblems.]]

This can be achieved in either of two ways:<ref>{{Cite web |title=Algorithms by Jeff Erickson |url=https://jeffe.cs.illinois.edu/teaching/algorithms/ |access-date=2024-12-06 |website=jeffe.cs.illinois.edu}}</ref>

* ''[[Top-down and bottom-up design|Top-down approach]]'': This is the direct fall-out of the recursive formulation of any problem. If the solution to any problem can be formulated recursively using the solution to its sub-problems, and if its sub-problems are overlapping, then one can easily [[memoize]] or store the solutions to the sub-problems in a table (often an [[Array (data structure)|array]] or [[Hash table|hashtable]] in practice). Whenever we attempt to solve a new sub-problem, we first check the table to see if it is already solved. If a solution has been recorded, we can use it directly, otherwise we solve the sub-problem and add its solution to the table.
* ''[[Top-down and bottom-up design|Bottom-up approach]]'': Once we formulate the solution to a problem recursively as in terms of its sub-problems, we can try reformulating the problem in a bottom-up fashion: try solving the sub-problems first and use their solutions to build-on and arrive at solutions to bigger sub-problems. This is also usually done in a tabular form by iteratively generating solutions to bigger and bigger sub-problems by using the solutions to small sub-problems. For example, if we already know the values of ''F''<sub>41</sub> and ''F''<sub>40</sub>, we can directly calculate the value of ''F''<sub>42</sub>.

Some [[programming language]]s can automatically [[memoization|memoize]] the result of a function call with a particular set of arguments, in order to speed up [[call-by-name]] evaluation (this mechanism is referred to as ''[[call-by-need]]''). Some languages make it possible portably (e.g. [[Scheme (programming language)|Scheme]], [[Common Lisp]], [[Perl]] or [[D (programming language)|D]]). Some languages have automatic [[memoization]] <!-- still not a typo for "memor-" --> built in, such as tabled [[Prolog]] and [[J (programming language)|J]], which supports memoization with the ''M.'' adverb.<ref>{{cite web|title=M. Memo|url=http://www.jsoftware.com/help/dictionary/dmcapdot.htm|work=J Vocabulary|publisher=J Software|access-date=28 October 2011}}</ref> In any case, this is only possible for a [[referentially transparent]] function. Memoization is also encountered as an easily accessible design pattern within term-rewrite based languages such as [[Wolfram Language]].

=== Bioinformatics ===
Dynamic programming is widely used in bioinformatics for tasks such as [[sequence alignment]], [[protein folding]], RNA structure prediction and protein-DNA binding. The first dynamic programming algorithms for protein-DNA binding were developed in the 1970s independently by [[Charles DeLisi]] in the US<ref>{{citation
 | last = Delisi | first = Charles
 | date = July 1974
 | doi = 10.1002/bip.1974.360130719
 | issue = 7
 | journal = Biopolymers
 | pages = 1511–1512
 | title = Cooperative phenomena in homopolymers: An alternative formulation of the partition function
 | volume = 13}}</ref> and by Georgii Gurskii and Alexander Zasedatelev in the [[Soviet Union]].<ref>{{citation
 | last1 = Gurskiĭ | first1 = G. V.
 | last2 = Zasedatelev | first2 = A. S.
 | date = September 1978
 | issue = 5
 | journal = Biofizika
 | pages = 932–946
 | pmid = 698271
 | title = Precise relationships for calculating the binding of regulatory proteins and other lattice ligands in double-stranded polynucleotides
 | volume = 23}}</ref> Recently these algorithms have become very popular in bioinformatics and [[computational biology]], particularly in the studies of [[nucleosome]] positioning and [[transcription factor]] binding.