Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Index of coincidence
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Application== The index of coincidence is useful both in the analysis of [[natural language|natural-language]] [[plaintext]] and in the analysis of [[encryption|ciphertext]] ([[cryptanalysis]]). Even when only ciphertext is available for testing and plaintext letter identities are disguised, coincidences in ciphertext can be caused by coincidences in the underlying plaintext. This technique is used to [[Vigenère cipher#Cryptanalysis|cryptanalyze]] the [[Vigenère cipher]], for example. For a repeating-key [[polyalphabetic cipher]] arranged into a matrix, the coincidence rate within each column will usually be highest when the width of the matrix is a multiple of the key length, and this fact can be used to determine the key length, which is the first step in cracking the system. Coincidence counting can help determine when two texts are written in the same language using the same [[alphabet]]. (This technique has been used to examine the purported [[Bible code]]). The ''causal'' coincidence count for such texts will be distinctly higher than the ''accidental'' coincidence count for texts in different languages, or texts using different alphabets, or gibberish texts.{{Citation needed|date=September 2023}} To see why, imagine an "alphabet" of only the two letters A and B. Suppose that in our "language", the letter A is used 75% of the time, and the letter B is used 25% of the time. If two texts in this language are laid side by side, then the following pairs can be expected: {| class="wikitable" |- ! Pair ! Probability |- | AA | 56.25% |- | BB | 6.25% |- | AB | 18.75% |- | BA | 18.75% |- |} Overall, the probability of a "coincidence" is 62.5% (56.25% for AA + 6.25% for BB). Now consider the case when ''both'' messages are encrypted using the simple monoalphabetic [[substitution cipher]] which replaces A with B and vice versa: {| class="wikitable" |- ! Pair ! Probability |- | AA | 6.25% |- | BB | 56.25% |- | AB | 18.75% |- | BA | 18.75% |- |} The overall probability of a coincidence in this situation is 62.5% (6.25% for AA + 56.25% for BB), exactly the same as for the unencrypted "plaintext" case. In effect, the new alphabet produced by the substitution is just a uniform renaming of the original character identities, which does not affect whether they match. Now suppose that only ''one'' message (say, the second) is encrypted using the same substitution cipher (A,B)→(B,A). The following pairs can now be expected: {| class="wikitable" |- ! Pair ! Probability |- | AA | 18.75% |- | BB | 18.75% |- | AB | 56.25% |- | BA | 6.25% |- |} Now the probability of a coincidence is only 37.5% (18.75% for AA + 18.75% for BB). This is noticeably lower than the probability when same-language, same-alphabet texts were used. Evidently, coincidences are more likely when the most frequent letters in each text are the same. The same principle applies to real languages like English, because certain letters, like E, occur much more frequently than other letters—a fact which is used in [[frequency analysis (cryptanalysis)|frequency analysis]] of [[substitution cipher]]s. Coincidences involving the letter E, for example, are relatively likely. So when any two English texts are compared, the coincidence count will be higher than when an English text and a foreign-language text are used. This effect can be subtle. For example, similar languages will have a higher coincidence count than dissimilar languages. Also, it is not hard to generate random text with a frequency distribution similar to real text, artificially raising the coincidence count. Nevertheless, this technique can be used effectively to identify when two texts are likely to contain meaningful information in the same language using the same alphabet, to discover periods for repeating keys, and to uncover many other kinds of nonrandom phenomena within or among ciphertexts. Expected values for various languages<ref>{{cite book|author=[[William F. Friedman|Friedman, W.F.]] and [[Lambros D. Callimahos|Callimahos, L.D.]]|title=[[Military Cryptanalytics]], Part I – Volume 2|orig-year=1956|publisher=Reprinted by Aegean Park Press|isbn=0-89412-074-3|year=1985}}</ref> are: {| class="wikitable" |- ! Language ! Index of Coincidence |- | English | 1.73 |- | French | 2.02 |- | German | 2.05 |- | Italian | 1.94 |- | Portuguese | 1.94 |- | Russian | 1.76 |- | Spanish | 1.94 |- |}
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Index of coincidence
(section)
Add topic