Editing Elo rating system (section)

== Practical issues ==

=== Game activity versus protecting one's rating ===
In some cases the rating system can discourage game activity for players who wish to protect their rating.<ref>[http://www.chesscafe.com/text/skittles176.pdf A Parent's Guide to Chess] {{Webarchive|url=https://web.archive.org/web/20080528195206/http://www.chesscafe.com/text/skittles176.pdf |date=2008-05-28 }} ''Skittles'', Don Heisman, Chesscafe.com, August 4, 2002</ref> In order to discourage players from sitting on a high rating, a 2012 proposal by British Grandmaster [[John Nunn]] for choosing qualifiers to the chess world championship included an activity bonus, to be combined with the rating.<ref>{{cite web |url=http://www.chessbase.com/newsdetail.asp?newsid=2440 |title=Chess News – The Nunn Plan for the World Chess Championship |date=8 June 2005 |publisher=ChessBase.com |access-date=2012-02-19 |archive-date=2011-11-19 |archive-url=https://web.archive.org/web/20111119132830/http://www.chessbase.com/newsdetail.asp?newsid=2440 |url-status=live }}</ref>

Beyond the chess world, concerns over players avoiding competitive play to protect their ratings caused [[Wizards of the Coast]] to abandon the Elo system for ''[[Magic: the Gathering]]'' tournaments in favour of a system of their own devising called "Planeswalker Points".<ref name=Planeswalkerpointsarticle>{{cite web |url=http://www.wizards.com/Magic/Magazine/Article.aspx?x=mtg/daily/feature/159b |title=Introducing Planeswalker Points |date=September 6, 2011 |access-date=September 9, 2011 |archive-date=September 30, 2011 |archive-url=https://web.archive.org/web/20110930202112/http://www.wizards.com/Magic/Magazine/Article.aspx?x=mtg%2Fdaily%2Ffeature%2F159b |url-status=dead}}</ref><ref name=planeswalkerpointsarticle2>{{cite web |url=http://magic.wizards.com/en/articles/archive/week-was/getting-points-aaron-and-scott-2011-09-09 |title=Getting to the Points |work=MAGIC: THE GATHERING |date=September 9, 2011 |access-date=September 9, 2011 |archive-date=October 18, 2016 |archive-url=https://web.archive.org/web/20161018204636/http://magic.wizards.com/en/articles/archive/week-was/getting-points-aaron-and-scott-2011-09-09 |url-status=live}}</ref>

=== Selective pairing ===
{{unreferenced section|date=January 2017}}
A more subtle issue is related to pairing. When players can choose their own opponents, they can choose opponents with minimal risk of losing, and maximum reward for winning. Particular examples of players rated 2800+ choosing opponents with minimal risk and maximum possibility of rating gain include: choosing opponents that they know they can beat with a certain strategy; choosing opponents that they think are overrated; or avoiding playing strong players who are rated several hundred points below them, but may hold chess titles such as IM or GM. In the category of choosing overrated opponents, new entrants to the rating system who have played fewer than 50 games are in theory a convenient target as they may be overrated in their provisional rating. The ICC compensates for this issue by assigning a lower K-factor to the established player if they do win against a new rating entrant. The K-factor is actually a function of the number of rated games played by the new entrant.

Therefore, Elo ratings online still provide a useful mechanism for providing a rating based on the opponent's rating. Its overall credibility, however, needs to be seen in the context of at least the above two major issues described&mdash;engine abuse, and selective pairing of opponents.

The ICC has also recently{{when|date=December 2024}} introduced "auto-pairing" ratings which are based on random pairings, but with each win in a row ensuring a statistically much harder opponent who has also won x games in a row. With potentially hundreds of players involved, this creates some of the challenges of a major large Swiss event which is being fiercely contested, with round winners meeting round winners. This approach to pairing certainly maximizes the rating risk of the higher-rated participants, who may face very stiff opposition from players below 3000, for example. This is a separate rating in itself, and is under "1-minute" and "5-minute" rating categories. Maximum ratings achieved over 2500 are exceptionally rare.

=== Ratings inflation and deflation ===
[[File:Elo rating graph.svg|thumb|300px|Graphs of probabilities and Elo rating changes (for K=16 and 32) of expected outcome (solid curve) and unexpected outcome (dotted curve) vs initial rating difference.

For example, player&nbsp;{{mvar|A}} starts with a 1400 rating and {{mvar|B}} with 1800 in a tournament using {{math|1=''K'' = 32}} (brown curves). The blue dash-dot line denotes the initial rating difference of 400 ({{math|1800 − 1400}}). The probability of {{mvar|B}} winning, the expected outcome, is 0.91 (intersection of black solid curve and blue line); if this happens, {{mvar|A}}'s rating decreases by 3 (intersection of brown solid curve and blue line) to 1397 and {{mvar|B}}'s increases by the same amount to 1803. Conversely, the probability of {{mvar|A}} winning, the unexpected outcome, is 0.09 (intersection of black dotted curve and blue line); if this happens, {{mvar|A}}'s rating increases by 29 (intersection of brown dotted curve and blue line) to 1429 and {{mvar|B}}'s decreases by the same amount to 1771.]]

The term "inflation", applied to ratings, is meant to suggest that the level of playing strength demonstrated by the rated player is decreasing over time; conversely, "deflation" suggests that the level is advancing. For example, if there is inflation, a modern rating of 2500 means less than a historical rating of 2500, while the reverse is true if there is deflation. Using ratings to compare players between different eras is made more difficult when inflation or deflation are present. (See also [[Comparison of top chess players throughout history]].)

Analyzing FIDE rating lists over time, Jeff Sonas suggests that inflation may have taken place since about 1985.<ref name="Sonasinflation">{{cite web |author=Jeff Sonas |title=Rating inflation – its causes and possible cures |url=https://en.chessbase.com/post/rating-inflation-its-causes-and-poible-cures |website=chessbase.com |date=27 July 2009 |access-date=27 August 2009 |archive-date=23 November 2013 |archive-url=https://web.archive.org/web/20131123073932/http://en.chessbase.com/post/rating-inflation-its-causes-and-poible-cures |url-status=live}}</ref> Sonas looks at the highest-rated players, rather than all rated players, and acknowledges that the changes in the distribution of ratings could have been caused by an increase of the standard of play at the highest levels, but looks for other causes as well.

The number of people with ratings over 2700 has increased. Around 1979 there was only one active player ([[Anatoly Karpov]]) with a rating this high. In 1992 [[Viswanathan Anand]] was only the 8th player in chess history to reach the 2700 mark at that point of time.<ref name="Viswanathan Anand">{{cite web |url=http://www.chessgames.com/perl/chessplayer?pid=12088 |title=Viswanathan Anand |publisher=Chessgames.com |access-date=2012-08-14 |archive-date=2013-03-28 |archive-url=https://web.archive.org/web/20130328031126/http://www.chessgames.com/perl/chessplayer?pid=12088 |url-status=live}}</ref> This increased to 15 players by 1994. 33 players had a 2700+ rating in 2009 and 44 as of September 2012. Only 14 players have ever broken a rating of 2800.

One possible cause for this inflation was the rating floor, which for a long time was at 2200, and if a player dropped below this they were struck from the rating list. As a consequence, players at a skill level just below the floor would only be on the rating list if they were overrated, and this would cause them to feed points into the rating pool.<ref name="Sonasinflation"/> In July 2000 the average rating of the top 100 was 2644. By July 2012 it had increased to 2703.<ref name="Viswanathan Anand"/>

Using a strong [[chess engine]] to evaluate moves played in games between rated players, Regan and Haworth analyze sets of games from FIDE-rated tournaments, and draw the conclusion that there had been little or no inflation from 1976 to 2009.<ref>{{Cite journal |last1=Regan |first1=Kenneth |last2=Haworth |first2=Guy |date=2011-08-04 |title=Intrinsic Chess Ratings |url=https://ojs.aaai.org/index.php/AAAI/article/view/7951 |journal=Proceedings of the AAAI Conference on Artificial Intelligence |language=en |volume=25 |issue=1 |pages=834–839 |doi=10.1609/aaai.v25i1.7951 |s2cid=15489049 |issn=2374-3468 |access-date=2021-09-01 |archive-date=2021-04-20 |archive-url=https://web.archive.org/web/20210420130735/https://ojs.aaai.org/index.php/AAAI/article/view/7951 |url-status=live|doi-access=free }}</ref>

In a pure Elo system, each game ends in an equal transaction of rating points. If the winner gains N rating points, the loser will drop by N rating points. This prevents points from entering or leaving the system when games are played and rated. However, players tend to enter the system as novices with a low rating and retire from the system as experienced players with a high rating. Therefore, in the long run a system with strictly equal transactions tends to result in rating deflation.<ref>{{cite web |url=http://www.sjakk.no/nsf/elosystem_main.html |title=ELO-SYSTEMET |last=Bergersen |first=Per A |publisher=Norwegian Chess Federation |language=no |access-date=21 October 2013 |url-status=dead |archive-url=https://wayback.archive-it.org/all/20130308184326/http://www.sjakk.no/nsf/elosystem_main.html |archive-date=8 March 2013 }}</ref>

In 1995, the USCF acknowledged that several young scholastic players were improving faster than the rating system was able to track. As a result, established players with stable ratings started to lose rating points to the young and underrated players. Several of the older established players were frustrated over what they considered an unfair rating decline, and some even quit chess over it.<ref name="glickman">A conversation with Mark Glickman [http://www.glicko.net/ratings/cl-article.pdf] {{Webarchive|url=https://web.archive.org/web/20110807205021/http://www.glicko.net/ratings/cl-article.pdf|date=2011-08-07}}, Published in ''Chess Life'' October 2006 issue</ref>

====Combating deflation====
Because of the significant difference in timing of when inflation and deflation occur, and in order to combat deflation, most implementations of Elo ratings have a mechanism for injecting points into the system in order to maintain relative ratings over time. FIDE has two inflationary mechanisms. First, performances below a "ratings floor" are not tracked, so a player with true skill below the floor can only be unrated or overrated, never correctly rated. Second, established and higher-rated players have a lower K-factor. New players have a {{math|1=''K'' = 40}}, which drops to {{math|1=''K'' = 20}} after 30 played games, and to {{math|1=''K'' = 10}} when the player reaches 2400.<ref name="FideRules"/>
The current system in the United States includes a bonus point scheme which feeds rating points into the system in order to track improving players, and different K-values for different players.<ref name="glickman"/> Some methods, used in Norway for example, differentiate between juniors and seniors, and use a larger K-factor for the young players, even boosting the rating progress by 100% for when they score well above their predicted performance.<ref>{{cite web |url=http://www.sjakk.no/nsf/elosystem_index.html |title=Elo-systemet |website=Norges Sjakkforbund |access-date=2009-08-23 |url-status=dead |archive-url=https://web.archive.org/web/20131205025157/http://www.sjakk.no/nsf/elosystem_index.html |archive-date=December 5, 2013 }}</ref>

Rating floors in the United States work by guaranteeing that a player will never drop below a certain limit. This also combats deflation, but the chairman of the USCF Ratings Committee has been critical of this method because it does not feed the extra points to the improving players. A possible motive for these rating floors is to combat sandbagging, i.e., deliberate lowering of ratings to be eligible for lower rating class sections and prizes.<ref name="glickman"/>

===Ratings of computers===
[[Human–computer chess matches]] between 1997 ([[Deep Blue versus Garry Kasparov]]) and 2006 demonstrated that [[chess computers]] are capable of defeating even the strongest human players. However, [[chess engine]] ratings are difficult to quantify, due to variable factors such as the time control and the hardware the program runs on, and also the fact that chess is not a fair game. The existence and magnitude of the [[first-move advantage in chess]] becomes very important at the computer level. Beyond some skill threshold, an engine with White should be able to force a draw on demand from the starting position even against perfect play, simply because White begins with too big an advantage to lose compared to the small magnitude of the errors it is likely to make. Consequently, such an engine is more or less guaranteed to score at least 25% even against perfect play. Differences in skill beyond a certain point could only be picked up if one does not begin from the usual starting position, but instead chooses a starting position that is only barely not lost for one side. Because of these factors, ratings depend on pairings and the openings selected.<ref>Larry Kaufman, Chess Board Options (2021), p. 179</ref> Published engine rating lists such as [[CCRL]] are based on engine-only games on standard hardware configurations and are not directly comparable to FIDE ratings.

For some ratings estimates, see [[Chess engine rating lists|Chess engine § Ratings]].