Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Index of coincidence
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Calculation== The index of coincidence provides a measure of how likely it is to draw two matching letters by randomly selecting two letters from a given text. The chance of drawing a given letter in the text is (number of times that letter appears / length of the text). The chance of drawing that same letter again (without replacement) is (appearances β 1 / text length β 1). The product of these two values gives you the chance of drawing that letter twice in a row. One can find this product for each letter that appears in the text, then sum these products to get a chance of drawing two of a kind. This probability can then be normalized by multiplying it by some coefficient, typically 26 in English. :<math> \mathbf{IC} = c \times \left({\left({\frac{n_\mathrm{a}}{N} \times \frac{n_\mathrm{a} - 1}{N - 1}}\right) + \left({\frac{n_\mathrm{b}}{N} \times \frac{n_\mathrm{b} - 1}{N - 1}}\right) + \cdots + \left({\frac{n_\mathrm{z}}{N} \times \frac{n_\mathrm{z} - 1}{N - 1}}\right)}\right)</math> where ''c'' is the normalizing coefficient (26 for English), ''n''<sub>a</sub> is the number of times the letter "a" appears in the text, and ''N'' is the length of the text. We can express the index of coincidence '''IC''' for a given letter-frequency distribution as a summation: :<math>\mathbf{IC} = \frac{\displaystyle\sum_{i=1}^{c}n_i(n_i -1)}{N(N-1)/c}</math> where ''N'' is the length of the text and ''n''<sub>1</sub> through ''n<sub>c</sub>'' are the [[Letter frequencies|frequencies]] (as integers) of the ''c'' letters of the alphabet (''c'' = 26 for monocase [[English language|English]]). The sum of the ''n<sub>i</sub>'' is necessarily ''N''. The products {{math|''n''(''n'' β 1)}} count the number of [[combinations]] of ''n'' elements taken two at a time. (Actually this counts each pair twice; the extra factors of 2 occur in both numerator and denominator of the formula and thus cancel out.) Each of the ''n<sub>i</sub>'' occurrences of the ''i'' -th letter matches each of the remaining {{math|''n<sub>i</sub>'' β 1}} occurrences of the same letter. There are a total of {{math|''N''(''N'' β 1)}} letter pairs in the entire text, and 1/''c'' is the probability of a match for each pair, assuming a uniform [[random]] distribution of the characters (the "null model"; see below). Thus, this formula gives the ratio of the total number of coincidences observed to the total number of coincidences that one would expect from the null model.<ref>{{cite journal |last=Mountjoy |first=Marjorie | title= The Bar Statistics | journal=NSA Technical Journal | year=1963 | volume=VII | issue=2,4}} Published in two parts.</ref> The expected average value for the IC can be computed from the relative letter frequencies {{mvar|''f<sub>i</sub>''}} of the source language: :<math>\mathbf{IC}_{\mathrm{expected}} = \frac{\displaystyle\sum_{i=1}^{c}{f_i}^2}{1/c}.</math> If all {{mvar|c}} letters of an alphabet were equally probable, the expected index would be 1.0. The actual monographic IC for [[telegraph]]ic English text is around 1.73, reflecting the unevenness of [[natural language|natural-language]] letter distributions. Sometimes values are reported without the normalizing denominator, for example {{math|1=0.067 = 1.73/26}} for English; such values may be called ''ΞΊ''<sub>p</sub> ("kappa-plaintext") rather than IC, with ''ΞΊ''<sub>r</sub> ("kappa-random") used to denote the denominator {{math|1/''c''}} (which is the expected coincidence rate for a uniform distribution of the same alphabet, {{math|1=0.0385=1/26}} for English). English plaintext will generally fall somewhere in the range of 1.5 to 2.0 (normalized calculation).<ref>{{Cite journal |last=Kontou |first=Eleni |date=18 May 2020 |title=Index of Coincidence |url=https://core.ac.uk/display/327259203 |journal=University of Leicester Open Journals |via=[[CORE_(research_service)|CORE]]}}</ref>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Index of coincidence
(section)
Add topic