Editing Elo rating system (section)

=== Mathematical details ===
Performance is not measured absolutely; it is inferred from wins, losses, and draws against other players. Players' ratings depend on the ratings of their opponents and the results scored against them. The difference in rating between two players determines an estimate for the expected score between them. Both the average and the spread of ratings can be arbitrarily chosen. The USCF initially aimed for an average club player to have a rating of 1500 and Elo suggested scaling ratings so that a difference of 200 rating points in chess would mean that the stronger player has an ''expected score'' of approximately 0.75.

A player's ''expected score'' is their probability of winning plus half their probability of drawing. Thus, an expected score of 0.75 could represent a 75% chance of winning, 25% chance of losing, and 0% chance of drawing. On the other extreme it could represent a 50% chance of winning, 0% chance of losing, and 50% chance of drawing. The probability of drawing, as opposed to having a decisive result, is not specified in the Elo system. Instead, a draw is considered half a win and half a loss. In practice, since the true strength of each player is unknown, the expected scores are calculated using the player's current ratings as follows.

If player&nbsp;{{mvar|A}} has a rating of <math>\, R_\mathsf{A} \,</math> and player&nbsp;{{mvar|B}} a rating of <math>\, R_\mathsf{B} \,</math>, the exact formula (using the [[logistic curve]] with [[common logarithm|base 10]])<ref>Elo 1986, p. 141, ch. 8.4& Logistic probability as a rating basis</ref> for the expected score of player&nbsp;{{mvar|A}} is

:<math> E_\mathsf{A} = \frac 1 {1 + 10^{(R_\mathsf{B} - R_\mathsf{A})/400}} ~.</math>

Similarly, the expected score for player&nbsp;{{mvar|B}} is

:<math> E_\mathsf{B} = \frac 1 {1 + 10^{(R_\mathsf{A} - R_\mathsf{B})/400}} ~.</math>

This could also be expressed by

:<math> E_\mathsf{A} = \frac{ Q_\mathsf{A} }{ Q_\mathsf{A} + Q_\mathsf{B} } </math>

and

:<math> E_\mathsf{B} = \frac{ Q_\mathsf{B} }{Q_\mathsf{A} + Q_\mathsf{B} } ~,</math>

where <math>\; Q_\mathsf{A} = 10^{R_\mathsf{A}/400} \;,</math> and <math>\; Q_\mathsf{B} = 10^{R_\mathsf{B}/400} ~.</math> Note that in the latter case, the same denominator applies to both expressions, and it is plain that <math>\; E_\mathsf{A} + E_\mathsf{B} = 1 ~.</math> This means that by studying only the numerators, we find out that the expected score for player&nbsp;{{mvar|A}} is <math>\; Q_\mathsf{A}/Q_\mathsf{B} \;</math> times the expected score for player&nbsp;{{mvar|B}}. We can achieve this algebraically by subtracting 1 from the reciprocal of <math>E_\mathsf{B}</math> before multiplying <math>E_\mathsf{B}</math>. It then follows that for each 400 rating points of advantage over the opponent, the expected score is magnified ten times in comparison to the opponent's expected score.

When a player's actual tournament scores exceed their expected scores, the Elo system takes this as evidence that player's rating is too low, and needs to be adjusted upward. Similarly, when a player's actual tournament scores fall short of their expected scores, that player's rating is adjusted downward. Elo's original suggestion, which is still widely used, was a simple linear adjustment proportional to the amount by which a player over-performed or under-performed their expected score. The maximum possible adjustment per game, called the K-factor, was set at <math>\; K = 16 \;</math> for masters and <math>\; K = 32 \;</math> for weaker players.

Suppose player&nbsp;{{mvar|A}} (again with rating <math>R_\mathsf{A}</math>) was expected to score <math>\, E_\mathsf{A} \,</math> points but actually scored <math>\, S_\mathsf{A} \,</math> points. The formula for updating that player's rating is

:<math>R_\mathsf{A}' = R_\mathsf{A} + K \cdot (S_\mathsf{A} - E_\mathsf{A}) ~.</math><ref name="aodhosting" />

This update can be performed after each game or each tournament, or after any suitable rating period.

An example may help to clarify:
{{blockquote|1=Suppose player&nbsp;{{mvar|A}} has a rating of 1613 and plays in a five-round tournament. They lose to a player rated 1609, draw with a player rated 1477, defeat a player rated 1388, defeat a player rated 1586, and lose to a player rated 1720. The player's actual score is {{math|(0 + 0.5 + 1 + 1 + 0) {{=}} 2.5}}. The expected score, calculated according to the formula above, was {{math|1=(0.51 + 0.69 + 0.79 + 0.54 + 0.35) = 2.88}}.
Therefore, the player's new rating is {{math|[1613 + 32·(2.5 − 2.88)] {{=}} 1601}}, assuming that a {{mvar|K}}-factor of 32 is used. Equivalently, each game the player can be said to have put an ante of {{mvar|K}} times their expected score for the game into a pot, the opposing player does likewise, and the winner collects the full pot of value {{mvar|K}}; in the event of a draw, the players [[Glossary of poker terms|split the pot]] and receive <math>\; \tfrac{1}{2}K \;</math> points each.

Note that while two wins, two losses, and one draw may seem like a par score, it is worse than expected for player&nbsp;{{mvar|A}} because their opponents were lower rated on average. Therefore, player&nbsp;{{mvar|A}} is slightly penalized. If player&nbsp;{{mvar|A}} had scored two wins, one loss, and two draws, for a total score of three points, that would have been slightly better than expected, and the player's new rating would have been {{math|[1613 + 32·(3 − 2.88)] {{=}} 1617}}.}}

This updating procedure is at the core of the ratings used by [[FIDE]], [[United States Chess Federation|USCF]], [[Yahoo! Games]], the [[Internet Chess Club]] (ICC) and the [[Free Internet Chess Server]] (FICS). However, each organization has taken a different approach to dealing with the uncertainty inherent in the ratings, particularly the ratings of newcomers, and to dealing with the problem of ratings inflation/deflation. New players are assigned provisional ratings, which are adjusted more drastically than established ratings.

The principles used in these rating systems can be used for rating other competitions—for instance, international [[association football|football]] matches.

Elo ratings have also been applied to games without the possibility of [[Draw (chess)|draw]]s, and to games in which the result can also have a quantity (small/big margin) in addition to the quality (win/loss). See [[Go ranks and ratings#Elo ratings as used in Go|Go rating with Elo]] for more.

{{see also|Hubbert curve}}

==== Suggested modification ====
In 2011 after analyzing 1.5 million FIDE rated games, [[Jeff Sonas]] demonstrated according to the Elo formula, two players having a rating difference of {{mvar|X}} actually have a true difference of around {{math|''X''(5/6)}}. Likewise, one can leave the rating difference alone and divide by 480 instead of 400. Since the Elo formula is overestimating the stronger player's win probability, stronger players are losing points against weaker players despite playing at their true strength. Likewise, weaker players gain points against stronger players. When the modification is applied, observed win rates deviate by less than 0.1% away from prediction, while traditional Elo can be 4% off the predicted rate.<ref>{{cite web | url=https://en.chessbase.com/post/the-elo-rating-system-correcting-the-expectancy-tables | title=The Elo rating system – correcting the expectancy tables | date=30 March 2011 }}</ref>

==== Most accurate distribution model ====
The first mathematical concern addressed by the USCF was the use of the [[normal distribution]]. They found that this did not accurately represent the actual results achieved, particularly by the lower rated players. Instead they switched to a [[logistic distribution]] model, which the USCF found provided a better fit for the actual results achieved.<ref>Elo 1986, ch. 8.73</ref>{{citation needed|date=March 2019}} FIDE also uses an approximation to the logistic distribution.<ref name="fiderr2017">{{cite report |title=FIDE Rating Regulations effective from 1 July 2017 |website=FIDE Online (fide.com) |publisher=[[FIDE]] |url=https://handbook.fide.com/chapter/B022017 |access-date=2017-09-09 |date= |archive-date=2019-11-27 |archive-url=https://web.archive.org/web/20191127231614/https://handbook.fide.com/chapter/B022017 |url-status=live }}</ref>

==== Most accurate K-factor ====
The second major concern is the correct "{{mvar|K}}-factor" used. The chess statistician [[Jeff Sonas]] believes that the original <math>\; K = 10 \;</math> value (for players rated above 2400) is inaccurate in Elo's work. If the {{mvar|K}}-factor coefficient is set too large, there will be too much sensitivity to just a few, recent events, in terms of a large number of points exchanged in each game. And if the K-value is too low, the sensitivity will be minimal, and the system will not respond quickly enough to changes in a player's actual level of performance.

Elo's original {{mvar|K}}-factor estimation was made without the benefit of huge databases and statistical evidence. Sonas indicates that a {{mvar|K}}-factor of 24 (for players rated above 2400) may be both more accurate as a predictive tool of future performance and be more sensitive to performance.<ref>A key Sonas article is {{cite news |first=Jeff |last=Sonas |title=The Sonas rating formula — better than Elo? |website=chessbase.com |url=http://www.chessbase.com/newsdetail.asp?newsid=562 |access-date=2005-05-01 |archive-date=2005-03-05 |archive-url=https://web.archive.org/web/20050305143008/http://www.chessbase.com/newsdetail.asp?newsid=562 |url-status=live }}</ref>

Certain Internet chess sites seem to avoid a three-level K-factor staggering based on rating range. For example, the ICC seems to adopt a global {{math|1=''K'' = 32}} except when playing against provisionally rated players.

The USCF (which makes use of a [[logistic distribution]] as opposed to a [[normal distribution]]) formerly staggered the K-factor according to three main rating ranges:
:{|
|- 
! {{mvar|K}}-factor !! Used for players with ratings ...
|- style="vertical-align:top;"
| <math>\; K = 32 \;</math> || below 2100 
|- style="vertical-align:top;"
| <math>\; K = 24 \;</math> || between 2100 and 2400 
|- style="vertical-align:top;"
| <math>\; K = 16 \;</math> || above 2400 
|}

Currently, the USCF uses a formula that calculates the {{mvar|K}}-factor based on factors including the number of games played and the player's rating. The K-factor is also reduced for high rated players if the event has shorter time controls.<ref name="uschess2020">{{cite report |title=The US Chess Rating system |date=April 24, 2017 |via=glicko.net |url=http://www.glicko.net/ratings/rating.system.pdf |access-date=16 February 2020 |archive-date=7 February 2020 |archive-url=https://web.archive.org/web/20200207072639/http://www.glicko.net/ratings/rating.system.pdf |url-status=live}}</ref>

FIDE uses the following ranges:<ref name="FideRules">{{cite report |title=FIDE Rating Regulations effective from 1&nbsp;July 2014 |date=2014-07-01 |website=FIDE Online (fide.com) |publisher=[[FIDE]] |url=http://www.fide.com/fide/handbook.html?id=172&view=article |access-date=2014-07-01 |archive-date=2014-07-01 |archive-url=https://web.archive.org/web/20140701031750/http://www.fide.com/fide/handbook.html?id=172&view=article |url-status=live}}</ref>

:{|
|- style="vertical-align:top;"
! {{mvar|K}}-factor !! Used for players with ratings ...
|- style="vertical-align:top;"
| <math>\; K = 40 \;</math> || for a player new to the rating list until the completion of events with a total of 30&nbsp;games, and for all players until their 18th birthday, as long as their rating remains under 2300.
|- style="vertical-align:top;"
| <math>\; K = 20 \;</math> || for players who have always been rated under 2400.
|- style="vertical-align:top;"
| <math>\; K = 10 \;</math> || for players with any published rating of at least 2400 and at least 30&nbsp;games played in previous events. Thereafter it remains permanently at 10.
|}

FIDE used the following ranges before July&nbsp;2014:<ref name="FideRulesOld">{{cite report |title=FIDE Rating Regulations valid from 1&nbsp;July 2013 till 1&nbsp;July 2014 |date=2013-07-01 |website=FIDE Online (fide.com) |url=http://www.fide.com/fide/handbook.html?id=161&view=article |access-date=2014-07-01 |archive-date=2014-07-15 |archive-url=https://web.archive.org/web/20140715002648/http://www.fide.com/fide/handbook.html?id=161&view=article |url-status=live }}</ref>

:{|
|-
! {{mvar|K}}-factor !! Used for players with ratings ...
|- style="vertical-align:top;"
| <math>\; K = 30 \;</math><br/>(was 25) || for a player new to the rating list until the completion of events with a total of 30 games.<ref name="Changes to FIDE Rating Regulations">{{cite press release |title=Changes to Rating Regulations |website=FIDE Online (fide.com) |publisher=[[FIDE]] |date=2011-07-21 |url=http://www.fide.com/component/content/article/1-fide-news/5421-changes-to-rating-regulations.html |access-date=2012-02-19 |archive-date=2012-05-13 |archive-url=https://web.archive.org/web/20120513170023/http://www.fide.com/component/content/article/1-fide-news/5421-changes-to-rating-regulations.html |url-status=dead }}</ref>
|- style="vertical-align:top;"
| <math>\; K = 15 \;</math> || for players who have always been rated under 2400.
|- style="vertical-align:top;"
| <math>\; K = 10 \;</math> || for players with any published rating of at least 2400 and at least 30 games played in previous events. Thereafter it remains permanently at 10.
|}

<!-- I find the remainder of this section incomprehensible. Help!! -->
<!-- I think -- but am not certain -- that it is saying a high K-factor helps high rated players exploit other flaws in the rating system in situations where they can choose their opponents. -->
The gradation of the {{mvar|K}}-factor reduces rating change at the top end of the rating range, reducing the possibility for rapid rise or fall of rating for those with a rating high enough to reach a low {{mvar|K}}-factor.

In theory, this might apply equally to online chess players and over-the-board players, since it is more difficult for all players to raise their rating after their rating has become high and their {{mvar|K}}-factor consequently reduced. However, when playing online, 2800+ players can more easily raise their rating by simply selecting opponents with high ratings – on the ICC playing site, a [[grandmaster (chess)|grandmaster]] may play a string of different opponents who are all rated over 2700.<ref>{{cite web |title=''K''-factor |series=ICC Help |website=Chessclub.com |date=2002-10-18 |url=http://www.chessclub.com/help/k-factor |access-date=2012-02-19 |url-status=dead|archive-url=https://web.archive.org/web/20120313023307/http://www.chessclub.com/help/k-factor |archive-date=2012-03-13 }}</ref> In over-the-board events, it would only be in very high level all-play-all events that a player would be able to engage that number of 2700+ opponents. In a normal, open, Swiss-paired chess tournament, frequently there would be many opponents rated less than 2500, reducing the ratings gains possible from a single contest for a high-rated player.