Editing Prisoner's dilemma (section)

==The iterated prisoner's dilemma==
{{more citations needed section|date=November 2012}}
If two players play the prisoner's dilemma more than once in succession, remember their opponent's previous actions, and are allowed to change their strategy accordingly, the game is called the iterated prisoner's dilemma.

In addition to the general form above, the iterative version also requires that {{tmath|2R > T + S}}, to prevent alternating cooperation and defection giving a greater reward than mutual cooperation.

The iterated prisoner's dilemma is fundamental to some theories of human cooperation and trust. Assuming that the game effectively models transactions between two people that require trust, cooperative behavior in populations can be modeled by a multi-player iterated version of the game. In 1975, [[Bernard Grofman|Grofman]] and [[Jonathan Pool|Pool]] estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoner's dilemma is also called the "[[Peace war game|peace-war game]]".<ref>{{Cite journal |last1=Grofman |first1=Bernard |last2=Pool |first2=Jonathan |date=January 1977 |title=How to make cooperation the optimizing strategy in a two-person game |url=http://dx.doi.org/10.1080/0022250x.1977.9989871 |journal=The Journal of Mathematical Sociology |volume=5 |issue=2 |pages=173–186 |doi=10.1080/0022250x.1977.9989871 |issn=0022-250X}}</ref><ref name = Shy>{{cite book | title= Industrial Organization: Theory and Applications | publisher=Massachusetts Institute of Technology Press | first1= Oz | last1=Shy |url=https://books.google.com/books?id=tr4CjJ5LlRcC&q=industrial+organization+theory+and+applications&pg=PR13  | year=1995 | isbn=978-0262193665 | access-date=February 27, 2013}}</ref>

=== General strategy ===
If the iterated prisoner's dilemma is played a finite number of times and both players know this, then the dominant strategy and Nash equilibrium is to defect in all rounds. The proof is [[Mathematical induction|inductive]]: one might as well defect on the last turn, since the opponent will not have a chance to later retaliate. Therefore, both will defect on the last turn. Thus, the player might as well defect on the second-to-last turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit.{{citation needed|date=May 2023}}

For [[cooperation]] to emerge between rational players, the number of rounds must be unknown or infinite. In that case, "always defect" may no longer be a dominant strategy. As shown by [[Robert Aumann]] in a 1959 paper,<ref>{{Citation |last=Aumann |first=Robert J. |title=Contributions to the Theory of Games (AM-40), Volume IV |chapter=16. Acceptable Points in General Cooperative n-Person Games |date=2016-03-02 |pages=287–324 |chapter-url=https://www.degruyter.com/document/doi/10.1515/9781400882168-018/html |access-date=2024-05-14 |publisher=Princeton University Press |language=en |doi=10.1515/9781400882168-018 |isbn=978-1-4008-8216-8}}</ref> rational players repeatedly interacting for indefinitely long games can sustain cooperation. Specifically, a player may be less willing to cooperate if their counterpart did not cooperate many times, which causes disappointment. Conversely, as time elapses, the likelihood of cooperation tends to rise, owing to the establishment of a "tacit agreement" among participating players. In experimental situations, cooperation can occur even when both participants know how many iterations will be played.<ref>{{cite journal |last1=Cooper |first1=Russell |last2=DeJong |first2=Douglas V. |last3=Forsythe |first3=Robert |last4=Ross |first4=Thomas W. |title=Cooperation without Reputation: Experimental Evidence from Prisoner's Dilemma Games |journal=Games and Economic Behavior |date=1996 |volume=12 |issue=2 |pages=187–218 |doi=10.1006/game.1996.0013}}</ref>

According to a 2019 experimental study in the ''American Economic Review'' that tested what strategies real-life subjects used in iterated prisoner's dilemma situations with perfect monitoring, the majority of chosen strategies were always to defect, [[Tit for tat|tit-for-tat]], and [[grim trigger]]. Which strategy the subjects chose depended on the parameters of the game.<ref>{{Cite journal|last1=Dal Bó|first1=Pedro|last2=Fréchette|first2=Guillaume R.|date=2019|title=Strategy Choice in the Infinitely Repeated Prisoner's Dilemma|journal=American Economic Review|language=en|volume=109|issue=11|pages=3929–3952|doi=10.1257/aer.20181480|s2cid=216726890|issn=0002-8282|url=https://www.aeaweb.org/articles?id=10.1257/aer.20181480}}</ref>

===Axelrod's tournament and successful strategy conditions===
Interest in the iterated prisoner's dilemma was kindled by [[Robert Axelrod (political scientist)|Robert Axelrod]] in his 1984 book ''[[The Evolution of Cooperation]]'', in which he reports on a tournament that he organized of the ''N''-step prisoner's dilemma (with ''N'' fixed) in which participants have to choose their strategy repeatedly and remember their previous encounters. Axelrod invited academic colleagues from around the world to devise computer strategies to compete in an iterated prisoner's dilemma tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.

Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, greedy strategies tended to do very poorly in the long run while more [[altruism|altruistic]] strategies did better, as judged purely by self-interest. He used this to show a possible mechanism for the evolution of altruistic behavior from mechanisms that are initially purely selfish, by [[natural selection]].

The winning [[deterministic algorithm|deterministic]] strategy was [[tit for tat]], developed and entered into the tournament by [[Anatol Rapoport]]. It was the simplest of any program entered, containing only four lines of [[BASIC]],<ref>{{harvp|Axelrod|2006|p=193}}</ref> and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move.<ref>{{harvp|Axelrod|2006|p=31}}</ref> Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness": when the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 1–5%, depending on the lineup of opponents). This allows for occasional recovery from getting trapped in a cycle of defections.

After analyzing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to succeed:<ref>{{harvp|Axelrod|2006|loc=chpt. 6}}</ref>

* '''Nice''': The strategy will not be the first to defect (this is sometimes referred to as an "optimistic" algorithm{{by whom|date=June 2024}}), i.e., it will not "cheat" on its opponent for purely self-interested reasons first. Almost all the top-scoring strategies were nice.{{efn|The tournament has two rounds. In the first round, each of the top eight strategies were nice, and not one of the bottom seven were nice. In the second round (strategy designers could take into account the results of the first round), all but one of the top fifteen strategies were nice (and that one ranked eighth). Of the bottom fifteen strategies, all but one were not nice.<ref>{{harvp|Axelrod|2006|pp=113-114}}</ref>}}
* '''Retaliating''': The strategy must sometimes retaliate. An example of a non-retaliating strategy is Always Cooperate, a very bad choice that will frequently be exploited by "nasty" strategies.
* '''Forgiving''': Successful strategies must be forgiving. Though players will retaliate, they will cooperate again if the opponent does not continue to defect. This can stop long runs of revenge and counter-revenge, maximizing points.{{efn|In contrast to strategies like [[grim trigger]] (also called Friedman), which is never first to defect, but once the other defects even once, grim trigger defects from then on.<ref>{{harvp|Axelrod|2006|p=36}}</ref>}}
* '''Non-envious''': The strategy must not strive to score more than the opponent.

In contrast to the one-time prisoner's dilemma game, the optimal strategy in the iterated prisoner's dilemma depends upon the strategies of likely opponents, and how they will react to defections and cooperation. For example, if a population consists entirely of players who always defect, except for one who follows the tit-for-tat strategy, that person is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy is to defect every time. More generally, given a population with a certain percentage of always-defectors with the rest being tit-for-tat players, the optimal strategy depends on the percentage and number of iterations played.{{Citation needed|date=September 2024}}

===Other strategies===

Deriving the optimal strategy is generally done in two ways:
* [[Bayesian Nash equilibrium]]: If the statistical distribution of opposing strategies can be determined an optimal counter-strategy can be derived analytically.{{efn|1=For example see the 2003 study<ref>{{cite web|url= http://econ.hevra.haifa.ac.il/~mbengad/seminars/whole1.pdf|title=Bayesian Nash equilibrium; a statistical test of the hypothesis|last1=Landsberger|first1=Michael|last2=Tsirelson|first2=Boris|year=2003|url-status=dead|archive-url= https://web.archive.org/web/20051002195142/http://econ.hevra.haifa.ac.il/~mbengad/seminars/whole1.pdf|archive-date=2005-10-02|publisher=[[Tel Aviv University]]}}</ref> for discussion of the concept and whether it can apply in real [[economic]] or strategic situations.}}
* [[Monte Carlo method|Monte Carlo]] simulations of populations have been made, where individuals with low scores die off, and those with high scores reproduce (a [[genetic algorithm]] for finding an optimal strategy). The mix of algorithms in the final population generally depends on the mix in the initial population. The introduction of mutation (random variation during reproduction) lessens the dependency on the initial population; empirical experiments with such systems tend to produce tit-for-tat players,{{Clarify|date=August 2016}} but no analytic proof exists that this will always occur.<ref>{{Citation|last1=Wu|first1=Jiadong|title=Cooperation on the Monte Carlo Rule: Prisoner's Dilemma Game on the Grid|date=2019|work=Theoretical Computer Science|volume=1069|pages=3–15|editor-last=Sun|editor-first=Xiaoming|publisher=Springer Singapore|language=en|doi=10.1007/978-981-15-0105-0_1|isbn=978-981-15-0104-3|last2=Zhao|first2=Chengye|series=Communications in Computer and Information Science |s2cid=118687103|editor2-last=He|editor2-first=Kun|editor3-last=Chen|editor3-first=Xiaoyun}}</ref>
In the strategy called [[win-stay, lose-switch]], faced with a failure to cooperate, the player switches strategy the next turn.<ref>{{cite journal |last1=Wedekind |first1=C. |last2=Milinski |first2=M. |date=2 April 1996 |title=Human cooperation in the simultaneous and the alternating Prisoner's Dilemma: Pavlov versus Generous Tit-for-Tat |journal=Proceedings of the National Academy of Sciences |volume=93 |issue=7 |pages=2686–2689 |bibcode=1996PNAS...93.2686W |doi=10.1073/pnas.93.7.2686 |pmc=39691 |pmid=11607644 |doi-access=free}}</ref> In certain circumstances,{{specify|date=November 2012}} Pavlov beats all other strategies by giving preferential treatment to co-players using a similar strategy.

Although tit-for-tat is considered the most [[robust]] basic strategy, a team from [[Southampton University]] in England introduced a more successful strategy at the 20th-anniversary iterated prisoner's dilemma competition. It relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start.<ref>{{cite press release|url= http://www.southampton.ac.uk/mediacentre/news/2004/oct/04_151.shtml|publisher=University of Southampton|title=University of Southampton team wins Prisoner's Dilemma competition|date=7 October 2004|url-status=dead|archive-url= https://web.archive.org/web/20140421055745/http://www.southampton.ac.uk/mediacentre/news/2004/oct/04_151.shtml|archive-date=2014-04-21}}</ref> Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the competing program's score. As a result, the 2004 Prisoners' Dilemma Tournament results show [[University of Southampton]]'s strategies in the first three places (and a number of positions towards the bottom), despite having fewer wins and many more losses than the GRIM strategy. The Southampton strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that a team's performance was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of [[minmaxing]]).

Because of this new rule, this competition also has little theoretical significance when analyzing single-agent strategies as compared to Axelrod's seminal tournament. But it provided a basis for analyzing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise.

Long before this new-rules tournament was played, [[Richard Dawkins]], in his book ''[[The Selfish Gene]]'', pointed out the possibility of such strategies winning if multiple entries were allowed, but wrote that Axelrod would most likely not have allowed them if they had been submitted. Such strategies also circumvent the rule against communication between players: the Southampton programs' "ten-move dance" allowed them to recognize one another, reinforcing how valuable communication can be in shifting the balance of the game.

Even without implicit collusion between [[computer program|software strategies]], tit-for-tat is not always the absolute winner of any given tournament; more precisely, its long-run results over a series of tournaments outperform its rivals, but this does not mean it is the most successful in the short term. The same applies to tit-for-tat with forgiveness and other optimal strategies.

This can also be illustrated using the Darwinian [[Evolutionarily stable strategy|ESS]] simulation. In such a simulation, tit-for-tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit-for-tat population is penetrable by non-retaliating nice strategies, which in turn are easy prey for the nasty strategies. Dawkins showed that here, no static mix of strategies forms a stable equilibrium, and the system will always oscillate between bounds.{{Citation needed|reason=Unsure if the original author meant to continue to cite The Selfish Gene here.|date=April 2023}}

===Stochastic iterated prisoner's dilemma===
{{undue weight section|date=May 2023}}
In a stochastic iterated prisoner's dilemma game, strategies are specified in terms of "cooperation probabilities".<ref name=Press2012>{{cite journal|last1=Press|first1=WH|last2=Dyson|first2=FJ|title=Iterated Prisoner's Dilemma contains strategies that dominate any evolutionary opponent|journal=[[Proceedings of the National Academy of Sciences of the United States of America]]|date=26 June 2012|volume=109|issue=26|pages=10409–13|doi=10.1073/pnas.1206569109|pmid=22615375|pmc=3387070|bibcode=2012PNAS..10910409P|doi-access=free}}</ref> In an encounter between player ''X'' and player ''Y'', ''X''{{'}}s strategy is specified by a set of probabilities ''P'' of cooperating with ''Y''. ''P'' is a function of the outcomes of their previous encounters or some subset thereof. If ''P'' is a function of only their most recent ''n'' encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities: <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>, where ''P<sub>cd</sub>'' is the probability that ''X'' will cooperate in the present encounter given that the previous encounter was characterized by ''X'' cooperating and ''Y'' defecting. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit-for-tat strategy written as <math>P=\{1,0,1,0\}</math>, in which ''X'' responds as ''Y'' did in the previous encounter. Another is the win-stay, lose switch strategy written as <math>P=\{1,0,0,1\}</math>. It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy that gives the same statistical results, so that only memory-1 strategies need be considered.<ref name="Press2012"/>

If <math>P</math> is defined as the above 4-element strategy vector of ''X'' and <math>Q=\{Q_{cc},Q_{cd},Q_{dc},Q_{dd}\}</math> as the 4-element strategy vector of ''Y'' (where the indices are from ''Y''<nowiki/>'s point of view), a transition matrix ''M'' may be defined for ''X'' whose ''ij''-th entry is the probability that the outcome of a particular encounter between ''X'' and ''Y'' will be ''j'' given that the previous encounter was ''i'', where ''i'' and ''j'' are one of the four outcome indices: ''cc'', ''cd'', ''dc'', or ''dd''. For example, from ''X''{{'}}s point of view, the probability that the outcome of the present encounter is ''cd'' given that the previous encounter was ''cd'' is equal to <math>M_{cd,cd}=P_{cd}(1-Q_{dc})</math>. Under these definitions, the iterated prisoner's dilemma qualifies as a [[stochastic process]] and ''M'' is a [[stochastic matrix]], allowing all of the theory of stochastic processes to be applied.<ref name="Press2012"/>

One result of stochastic theory is that there exists a stationary vector ''v'' for the matrix ''v'' such that <math>v\cdot M=v</math>. Without loss of generality, it may be specified that ''v'' is normalized so that the sum of its four components is unity. The ''ij''-th entry in <math>M^n</math> will give the probability that the outcome of an encounter between ''X'' and ''Y'' will be ''j'' given that the encounter ''n'' steps previous is ''i''. In the limit as ''n'' approaches infinity, ''M'' will converge to a matrix with fixed values, giving the long-term probabilities of an encounter producing ''j'' independent of ''i''. In other words, the rows of <math>M^\infty</math> will be identical, giving the long-term equilibrium result probabilities of the iterated prisoner's dilemma without the need to explicitly evaluate a large number of interactions. It can be seen that ''v'' is a stationary vector for <math>M^n</math> and particularly <math>M^\infty</math>, so that each row of <math>M^\infty</math> will be equal to ''v''. Thus, the stationary vector specifies the equilibrium outcome probabilities for ''X''. Defining <math>S_x=\{R,S,T,P\}</math> and <math>S_y=\{R,T,S,P\}</math> as the short-term payoff vectors for the {''cc,cd,dc,dd''} outcomes (from ''X''{{'}}s point of view), the equilibrium payoffs for ''X'' and ''Y'' can now be specified as <math>s_x=v\cdot S_x</math> and <math>s_y=v\cdot S_y</math>, allowing the two strategies ''P'' and ''Q'' to be compared for their long-term payoffs.

====Zero-determinant strategies====
{{undue weight section|date=May 2023}}
[[File:Iterated Prisoners Dilemma Venn-Diagram.svg|right|thumb|upright=1.5|The relationship between zero-determinant (ZD), cooperating and defecting strategies in the iterated prisoner's dilemma (iterated prisoner's dilemma)]]

In 2012, [[William H. Press]] and [[Freeman Dyson]] published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies.<ref name="Press2012"/> The long term payoffs for encounters between ''X'' and ''Y'' can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: <math>s_x = D(P, Q, S_x)</math> and <math>s_y = D(P, Q, S_y)</math>, which do not involve the stationary vector ''v''. Since the determinant function <math>s_y = D(P, Q, f)</math> is linear in <math>f</math>, it follows that <math>\alpha s_x + \beta s_y + \gamma = D(P, Q, \alpha S_x + \beta S_y + \gamma U)</math> (where <math>U =\{1,1,1,1 \}</math>). Any strategies for which <math>D(P, Q, \alpha S_x + \beta S_y + \gamma U) = 0</math> are by definition a ZD strategy, and the long-term payoffs obey the relation <math>\alpha s_x + \beta s_y + \gamma = 0</math>.

Tit-for-tat is a ZD strategy which is "fair", in the sense of not gaining advantage over the other player. But the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect, but would thereby hurt himself by getting a lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of [[ultimatum game]]. Specifically, ''X'' is able to choose a strategy for which <math>D(P, Q, \beta S_y + \gamma U) = 0</math>, unilaterally setting ''s<sub>y</sub>'' to a specific value within a particular range of values, independent of ''Y''{{'}}s strategy, offering an opportunity for ''X'' to "extort" player ''Y'' (and vice versa). But if ''X'' tries to set ''s<sub>x</sub>'' to a particular value, the range of possibilities is much smaller, consisting only of complete cooperation or complete defection.<ref name="Press2012"/>

An extension of the iterated prisoner's dilemma is an evolutionary stochastic iterated prisoner's dilemma, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not [[evolutionarily stable strategy|evolutionarily stable]]. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly because they reduce each other's surplus).<ref>{{cite journal|last=Adami|first=Christoph|author2=Arend Hintze|title=Evolutionary instability of Zero Determinant strategies demonstrates that winning isn't everything|journal=Nature Communications|volume=4|year=2013|issue=1 |page=3|arxiv=1208.2666|doi=10.1038/ncomms3193|pmid=23903782|pmc=3741637|bibcode=2013NatCo...4.2193A}}</ref>

Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is larger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and [[win–stay, lose–switch]] agents.<ref name=Hilbe2013>{{cite journal|last=Hilbe|first=Christian |author2=Martin A. Nowak |author3=Karl Sigmund|title=Evolution of extortion in Iterated Prisoner's Dilemma games|journal=PNAS|date=April 2013|volume=110|issue=17|pages=6913–18|doi=10.1073/pnas.1214834110|pmid=23572576 |pmc=3637695 |bibcode=2013PNAS..110.6913H |arxiv=1212.1067|doi-access=free }}</ref>

While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust. When the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the [[#Special case: Donation game|donation game]] by Alexander Stewart and Joshua Plotkin in 2013.<ref name=Stewart2013>{{cite journal|last=Stewart|first=Alexander J.|author2=Joshua B. Plotkin|title=From extortion to generosity, evolution in the Iterated Prisoner's Dilemma|journal=[[Proceedings of the National Academy of Sciences of the United States of America]]|year=2013|doi=10.1073/pnas.1306246110|pmid=24003115|volume=110|issue=38|pages=15348–53|bibcode=2013PNAS..11015348S|pmc=3780848|doi-access=free}}</ref> Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Ethan Akin to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff.<ref name="Akin2013">{{cite arXiv |eprint=1211.0969 |class=math.DS |first=Ethan |last=Akin |title=Stable Cooperative Solutions for the Iterated Prisoner's Dilemma |year=2013 |page=9}} {{bibcode|2012arXiv1211.0969A}}</ref> Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.<ref name=Stewart2013 />

===Continuous iterated prisoner's dilemma===
Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd<ref>{{cite journal | last1 = Le | first1 = S. | last2 = Boyd | first2 = R. |name-list-style=vanc| year = 2007 | title = Evolutionary Dynamics of the Continuous Iterated Prisoner's Dilemma | journal = Journal of Theoretical Biology | volume = 245 | issue = 2| pages = 258–67 | doi = 10.1016/j.jtbi.2006.09.016 | pmid = 17125798 | bibcode = 2007JThBi.245..258L }}</ref> found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. In a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from [[Assortative mating|assorting]] with one another. By contrast, in a discrete prisoner's dilemma, tit-for-tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit-for-tat-like cooperation are extremely rare<ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94.
</ref> even though tit-for-tat seems robust in theoretical models.