Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Entropy (information theory)
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Introduction== The core idea of information theory is that the "informational value" of a communicated message depends on the degree to which the content of the message is surprising. If a highly likely event occurs, the message carries very little information. On the other hand, if a highly unlikely event occurs, the message is much more informative. For instance, the knowledge that some particular number ''will not'' be the winning number of a lottery provides very little information, because any particular chosen number will almost certainly not win. However, knowledge that a particular number ''will'' win a lottery has high informational value because it communicates the occurrence of a very low probability event. The ''[[information content]],'' also called the ''surprisal'' or ''self-information,'' of an event <math>E</math> is a function that increases as the probability <math>p(E)</math> of an event decreases. When <math>p(E)</math> is close to 1, the surprisal of the event is low, but if <math>p(E)</math> is close to 0, the surprisal of the event is high. This relationship is described by the function <math display="block">\log\left(\frac{1}{p(E)}\right) ,</math> where <math>\log</math> is the [[logarithm]], which gives 0 surprise when the probability of the event is 1.<ref>{{cite web |url = https://www.youtube.com/watch?v=YtebGVx-Fxw |title = Entropy (for data science) Clearly Explained!!! |date = 24 August 2021 |via = [[YouTube]] |access-date = 5 October 2021 |archive-date = 5 October 2021 |archive-url = https://web.archive.org/web/20211005135139/https://www.youtube.com/watch?v=YtebGVx-Fxw |url-status = live }}</ref> In fact, {{math|log}} is the only function that satisfies Π° specific set of conditions defined in section ''{{slink|#Characterization}}''. Hence, we can define the information, or surprisal, of an event <math>E</math> by <math display="block">I(E) = -\log(p(E)) ,</math> or equivalently, <math display="block">I(E) = \log\left(\frac{1}{p(E)}\right) .</math> Entropy measures the expected (i.e., average) amount of information conveyed by identifying the outcome of a random trial.<ref name="mackay2003">{{cite book|last=MacKay|first=David J.C.|author-link=David J. C. MacKay|url=http://www.inference.phy.cam.ac.uk/mackay/itila/book.html|title=Information Theory, Inference, and Learning Algorithms|publisher=Cambridge University Press|year=2003|isbn=0-521-64298-1|access-date=9 June 2014|archive-date=17 February 2016|archive-url=https://web.archive.org/web/20160217105359/http://www.inference.phy.cam.ac.uk/mackay/itila/book.html|url-status=live}}</ref>{{rp|p=67}} This implies that rolling a die has higher entropy than tossing a coin because each outcome of a die toss has smaller probability (<math>p=1/6</math>) than each outcome of a coin toss (<math>p=1/2</math>). Consider a coin with probability {{math|''p''}} of landing on heads and probability {{math|1 β ''p''}} of landing on tails. The maximum surprise is when {{math|1=''p'' = 1/2}}, for which one outcome is not expected over the other. In this case a coin flip has an entropy of one [[bit]] (similarly, one [[Ternary numeral system|trit]] with equiprobable values contains <math>\log_2 3</math> (about 1.58496) bits of information because it can have one of three values). The minimum surprise is when {{math|1=''p'' = 0}} (impossibility) or {{math|1=''p'' = 1}} (certainty) and the entropy is zero bits. When the entropy is zero, sometimes referred to as unity<ref group=Note name=Note02/>, there is no uncertainty at all β no freedom of choice β no [[Information content|information]].<ref>{{Cite book |last=Shannon |first=Claude Elwood |title=The mathematical theory of communication |last2=Weaver |first2=Warren |date=1998 |publisher=Univ. of Illinois Press |isbn=978-0-252-72548-7 |location=Urbana |pages=15 |language=English}}</ref> Other values of ''p'' give entropies between zero and one bits. === Example === Information theory is useful to calculate the smallest amount of information required to convey a message, as in [[data compression]]. For example, consider the transmission of sequences comprising the 4 characters 'A', 'B', 'C', and 'D' over a binary channel. If all 4 letters are equally likely (25%), one cannot do better than using two bits to encode each letter. 'A' might code as '00', 'B' as '01', 'C' as '10', and 'D' as '11'. However, if the probabilities of each letter are unequal, say 'A' occurs with 70% probability, 'B' with 26%, and 'C' and 'D' with 2% each, one could assign variable length codes. In this case, 'A' would be coded as '0', 'B' as '10', 'C' as '110', and 'D' as '111'. With this representation, 70% of the time only one bit needs to be sent, 26% of the time two bits, and only 4% of the time 3 bits. On average, fewer than 2 bits are required since the entropy is lower (owing to the high prevalence of 'A' followed by 'B' β together 96% of characters). The calculation of the sum of probability-weighted log probabilities measures and captures this effect. English text, treated as a string of characters, has fairly low entropy; i.e. it is fairly predictable. We can be fairly certain that, for example, 'e' will be far more common than 'z', that the combination 'qu' will be much more common than any other combination with a 'q' in it, and that the combination 'th' will be more common than 'z', 'q', or 'qu'. After the first few letters one can often guess the rest of the word. English text has between 0.6 and 1.3 bits of entropy per character of the message.<ref name="Schneier, B page 234">Schneier, B: ''Applied Cryptography'', Second edition, John Wiley and Sons.</ref>{{rp|p=234}}
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Entropy (information theory)
(section)
Add topic