Editing Elo rating system (section)

==History==
[[Arpad Elo]] was a [[chess master]] and an active participant in the [[United States Chess Federation]] (USCF) from its founding in 1939.<ref>{{cite web |url=http://www.springfieldchessclub.com/icbarchive/ICB_2002_07.pdf |title=Remembering Richard, Part II |date=July 2002 |last=Redman |first=Tim |publisher=Illinois Chess Bulletin |archive-url=https://web.archive.org/web/20200630040943/http://www.springfieldchessclub.com/icbarchive/ICB_2002_07.pdf |access-date=2020-06-30 |archive-date=2020-06-30 |url-status=live }}</ref> The USCF used a numerical ratings system devised by [[Kenneth Harkness]] to enable members to track their individual progress in terms other than tournament wins and losses. The Harkness system was reasonably fair, but in some circumstances gave rise to ratings many observers considered inaccurate. 

On behalf of the USCF, Elo devised a new system with a more sound{{clarify|date=August 2024}} [[Statistics|statistical]] basis.<ref>{{Cite journal |last=Elo |first=Arpad E. |date=March 5, 1960 |title=The USCF Rating System |url=http://uscf1-nyc1.aodhosting.com/CL-AND-CR-ALL/CL-ALL/1960/1960_03_1.pdf |journal=[[Chess Life]] |publisher=[[United States Chess Federation|USCF]] |volume=XIV |issue=13 |pages=2}}</ref> At about the same time, György Karoly and Roger Cook independently developed a system based on the same principles for the New South Wales Chess Association.<ref>Elo 1986, p. 4</ref>

Elo's system replaced earlier systems of competitive rewards with a system based on statistical estimation. Rating systems for many sports award points in accordance with subjective evaluations of the 'greatness' of certain achievements. For example, winning an important [[golf]] tournament might be worth an arbitrarily chosen five times as many points as winning a lesser tournament.

A statistical endeavor, by contrast, uses a model that relates the game results to underlying variables representing the ability of each player.

Elo's central assumption was that the chess performance of each player in each game is a [[normal distribution|normally distributed]] [[random variable]]. Although a player might perform significantly better or worse from one game to the next, Elo assumed that the mean value of the performances of any given player changes only slowly over time. Elo thought of a player's true skill as the mean of that player's performance random variable.

A further assumption is necessary because chess performance in the above sense is still not measurable. One cannot look at a sequence of moves and derive a number to represent that player's skill. Performance can only be inferred from wins, draws, and losses. Therefore, a player who wins a game is assumed to have performed at a higher level than the opponent for that game. Conversely, a losing player is assumed to have performed at a lower level. If the game ends in  a draw, the two players are assumed to have performed at nearly the same level.

Elo did not specify exactly how close two performances ought to be to result in a draw as opposed to a win or loss. Actually, there is a probability of a draw that is dependent on the performance differential, so this latter is more of a confidence interval than any deterministic frontier. And while he thought it was likely that players might have different [[standard deviation]]s to their performances, he made a simplifying assumption to the contrary.

To simplify computation even further, Elo proposed a straightforward method of estimating the variables in his model (i.e., the true skill of each player). One could calculate relatively easily from tables how many games players would be expected to win based on comparisons of their ratings to those of their opponents. The ratings of a player who won more games than expected would be adjusted upward, while those of a player who won fewer than expected would be adjusted downward. Moreover, that adjustment was to be in linear proportion to the number of wins by which the player had exceeded or fallen short of their expected number.<ref>{{Cite journal |last=Elo |first=Arpad E. |date=June 1961 |title=The USCF Rating System - A Scientific Achievement |url=http://uscf1-nyc1.aodhosting.com/CL-AND-CR-ALL/CL-ALL/1961/1961_06.pdf#page=8 |journal=[[Chess Life]] |volume=XVI |number=6| pages=160–161|publisher=[[United States Chess Federation|USCF]]}}</ref>

From a modern perspective, Elo's simplifying assumptions are not necessary because computing power is inexpensive and widely available. Several people, most notably [[Mark Glickman]], have proposed using more sophisticated statistical machinery to estimate the same variables. On the other hand, the computational simplicity of the Elo system has proven to be one of its greatest assets. With the aid of a pocket calculator, an informed chess competitor can calculate to within one point what their next officially published rating will be, which helps promote a perception that the ratings are fair.

=== Implementing Elo's scheme ===
The USCF implemented Elo's suggestions in 1960,<ref name="aboutUSCF">{{cite web |url=http://www.uschess.org/about/about.php |title=About the USCF |publisher=United States Chess Federation |access-date=2008-11-10 |archive-date=2008-09-26 |archive-url=https://web.archive.org/web/20080926015601/http://www.uschess.org/about/about.php |url-status=live }}</ref> and the system quickly gained recognition as being both fairer and more accurate than the [[Chess rating systems#Harkness system|Harkness rating system]]. Elo's system was adopted by the [[Fédération Internationale des Échecs|World Chess Federation]] (FIDE) in 1970.<ref>Elo 1986, Preface to the First Edition</ref> Elo described his work in detail in ''The Rating of Chessplayers, Past and Present'', first published in 1978.<ref name="AEE1986">Elo 1986.</ref>

Subsequent statistical tests have suggested that chess performance is almost certainly not distributed as a [[normal distribution]], as weaker players have greater winning chances than Elo's model predicts.<ref>Elo 1986, ch. 8.73.</ref><ref>Glickman, Mark E., and Jones, Albyn C., {{url|http://www.glicko.net/research/chance.pdf|"Rating the chess rating system"}} (1999), Chance, 12, 2, 21-28.</ref> In paired comparison data, there is often very little practical difference in whether it is assumed that the differences in players' strengths are normally or [[Logistic distribution|logistically]] distributed. Mathematically, however, the logistic function is more convenient to work with than the normal distribution.<ref>Glickman, Mark E. (1995), {{url|http://www.glicko.net/research/acjpaper.pdf|"A Comprehensive Guide to Chess Ratings".}}
A subsequent version of this paper appeared in the ''American Chess Journal'', 3, pp. 59–102.</ref>
FIDE continues to use the rating difference table as proposed by Elo.{{r|fiderr2017|at=table 8.1b}}

The development of the Percentage Expectancy Table (table 2.11) is described in more detail by Elo as follows:<ref>Elo 1986, p159.</ref>
<blockquote>
The normal probabilities may be taken directly from the standard
tables of the areas under the normal curve when the difference in rating is
expressed as a z score. Since the standard deviation σ of individual
performances is defined as 200 points, the standard deviation σ' of the
differences in performances becomes σ√2 or 282.84. The z value of a
difference then is {{math|''D'' / 282.84}}. This will then divide the area under the
curve into two parts, the larger giving P for the higher rated player and
the smaller giving P for the lower rated player.

For example, let {{math|1=''D'' = 160}}. Then {{math|1=''z'' = 160 / 282.84 = .566}}. The table
gives {{math|.7143}} and {{math|.2857}} as the areas of the two portions under the curve.
These probabilities are rounded to two figures in table 2.11.
</blockquote>

The table is actually built with standard deviation {{math|200(10/7)}} as an approximation for {{math|200√2}}.{{citation needed|date=August 2023}}

The normal and logistic distributions are, in a way, arbitrary points in a spectrum of distributions which would work well. In practice, both of these distributions work very well for a number of different games.{{cn|date=August 2024}}