Editing Prisoner's dilemma (section)

===Stochastic iterated prisoner's dilemma===
{{undue weight section|date=May 2023}}
In a stochastic iterated prisoner's dilemma game, strategies are specified in terms of "cooperation probabilities".<ref name=Press2012>{{cite journal|last1=Press|first1=WH|last2=Dyson|first2=FJ|title=Iterated Prisoner's Dilemma contains strategies that dominate any evolutionary opponent|journal=[[Proceedings of the National Academy of Sciences of the United States of America]]|date=26 June 2012|volume=109|issue=26|pages=10409–13|doi=10.1073/pnas.1206569109|pmid=22615375|pmc=3387070|bibcode=2012PNAS..10910409P|doi-access=free}}</ref> In an encounter between player ''X'' and player ''Y'', ''X''{{'}}s strategy is specified by a set of probabilities ''P'' of cooperating with ''Y''. ''P'' is a function of the outcomes of their previous encounters or some subset thereof. If ''P'' is a function of only their most recent ''n'' encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities: <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>, where ''P<sub>cd</sub>'' is the probability that ''X'' will cooperate in the present encounter given that the previous encounter was characterized by ''X'' cooperating and ''Y'' defecting. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit-for-tat strategy written as <math>P=\{1,0,1,0\}</math>, in which ''X'' responds as ''Y'' did in the previous encounter. Another is the win-stay, lose switch strategy written as <math>P=\{1,0,0,1\}</math>. It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy that gives the same statistical results, so that only memory-1 strategies need be considered.<ref name="Press2012"/>

If <math>P</math> is defined as the above 4-element strategy vector of ''X'' and <math>Q=\{Q_{cc},Q_{cd},Q_{dc},Q_{dd}\}</math> as the 4-element strategy vector of ''Y'' (where the indices are from ''Y''<nowiki/>'s point of view), a transition matrix ''M'' may be defined for ''X'' whose ''ij''-th entry is the probability that the outcome of a particular encounter between ''X'' and ''Y'' will be ''j'' given that the previous encounter was ''i'', where ''i'' and ''j'' are one of the four outcome indices: ''cc'', ''cd'', ''dc'', or ''dd''. For example, from ''X''{{'}}s point of view, the probability that the outcome of the present encounter is ''cd'' given that the previous encounter was ''cd'' is equal to <math>M_{cd,cd}=P_{cd}(1-Q_{dc})</math>. Under these definitions, the iterated prisoner's dilemma qualifies as a [[stochastic process]] and ''M'' is a [[stochastic matrix]], allowing all of the theory of stochastic processes to be applied.<ref name="Press2012"/>

One result of stochastic theory is that there exists a stationary vector ''v'' for the matrix ''v'' such that <math>v\cdot M=v</math>. Without loss of generality, it may be specified that ''v'' is normalized so that the sum of its four components is unity. The ''ij''-th entry in <math>M^n</math> will give the probability that the outcome of an encounter between ''X'' and ''Y'' will be ''j'' given that the encounter ''n'' steps previous is ''i''. In the limit as ''n'' approaches infinity, ''M'' will converge to a matrix with fixed values, giving the long-term probabilities of an encounter producing ''j'' independent of ''i''. In other words, the rows of <math>M^\infty</math> will be identical, giving the long-term equilibrium result probabilities of the iterated prisoner's dilemma without the need to explicitly evaluate a large number of interactions. It can be seen that ''v'' is a stationary vector for <math>M^n</math> and particularly <math>M^\infty</math>, so that each row of <math>M^\infty</math> will be equal to ''v''. Thus, the stationary vector specifies the equilibrium outcome probabilities for ''X''. Defining <math>S_x=\{R,S,T,P\}</math> and <math>S_y=\{R,T,S,P\}</math> as the short-term payoff vectors for the {''cc,cd,dc,dd''} outcomes (from ''X''{{'}}s point of view), the equilibrium payoffs for ''X'' and ''Y'' can now be specified as <math>s_x=v\cdot S_x</math> and <math>s_y=v\cdot S_y</math>, allowing the two strategies ''P'' and ''Q'' to be compared for their long-term payoffs.

====Zero-determinant strategies====
{{undue weight section|date=May 2023}}
[[File:Iterated Prisoners Dilemma Venn-Diagram.svg|right|thumb|upright=1.5|The relationship between zero-determinant (ZD), cooperating and defecting strategies in the iterated prisoner's dilemma (iterated prisoner's dilemma)]]

In 2012, [[William H. Press]] and [[Freeman Dyson]] published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies.<ref name="Press2012"/> The long term payoffs for encounters between ''X'' and ''Y'' can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: <math>s_x = D(P, Q, S_x)</math> and <math>s_y = D(P, Q, S_y)</math>, which do not involve the stationary vector ''v''. Since the determinant function <math>s_y = D(P, Q, f)</math> is linear in <math>f</math>, it follows that <math>\alpha s_x + \beta s_y + \gamma = D(P, Q, \alpha S_x + \beta S_y + \gamma U)</math> (where <math>U =\{1,1,1,1 \}</math>). Any strategies for which <math>D(P, Q, \alpha S_x + \beta S_y + \gamma U) = 0</math> are by definition a ZD strategy, and the long-term payoffs obey the relation <math>\alpha s_x + \beta s_y + \gamma = 0</math>.

Tit-for-tat is a ZD strategy which is "fair", in the sense of not gaining advantage over the other player. But the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect, but would thereby hurt himself by getting a lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of [[ultimatum game]]. Specifically, ''X'' is able to choose a strategy for which <math>D(P, Q, \beta S_y + \gamma U) = 0</math>, unilaterally setting ''s<sub>y</sub>'' to a specific value within a particular range of values, independent of ''Y''{{'}}s strategy, offering an opportunity for ''X'' to "extort" player ''Y'' (and vice versa). But if ''X'' tries to set ''s<sub>x</sub>'' to a particular value, the range of possibilities is much smaller, consisting only of complete cooperation or complete defection.<ref name="Press2012"/>

An extension of the iterated prisoner's dilemma is an evolutionary stochastic iterated prisoner's dilemma, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not [[evolutionarily stable strategy|evolutionarily stable]]. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly because they reduce each other's surplus).<ref>{{cite journal|last=Adami|first=Christoph|author2=Arend Hintze|title=Evolutionary instability of Zero Determinant strategies demonstrates that winning isn't everything|journal=Nature Communications|volume=4|year=2013|issue=1 |page=3|arxiv=1208.2666|doi=10.1038/ncomms3193|pmid=23903782|pmc=3741637|bibcode=2013NatCo...4.2193A}}</ref>

Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is larger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and [[win–stay, lose–switch]] agents.<ref name=Hilbe2013>{{cite journal|last=Hilbe|first=Christian |author2=Martin A. Nowak |author3=Karl Sigmund|title=Evolution of extortion in Iterated Prisoner's Dilemma games|journal=PNAS|date=April 2013|volume=110|issue=17|pages=6913–18|doi=10.1073/pnas.1214834110|pmid=23572576 |pmc=3637695 |bibcode=2013PNAS..110.6913H |arxiv=1212.1067|doi-access=free }}</ref>

While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust. When the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the [[#Special case: Donation game|donation game]] by Alexander Stewart and Joshua Plotkin in 2013.<ref name=Stewart2013>{{cite journal|last=Stewart|first=Alexander J.|author2=Joshua B. Plotkin|title=From extortion to generosity, evolution in the Iterated Prisoner's Dilemma|journal=[[Proceedings of the National Academy of Sciences of the United States of America]]|year=2013|doi=10.1073/pnas.1306246110|pmid=24003115|volume=110|issue=38|pages=15348–53|bibcode=2013PNAS..11015348S|pmc=3780848|doi-access=free}}</ref> Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Ethan Akin to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff.<ref name="Akin2013">{{cite arXiv |eprint=1211.0969 |class=math.DS |first=Ethan |last=Akin |title=Stable Cooperative Solutions for the Iterated Prisoner's Dilemma |year=2013 |page=9}} {{bibcode|2012arXiv1211.0969A}}</ref> Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.<ref name=Stewart2013 />