Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Entropy (information theory)
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Data compression=== {{Main|Shannon's source coding theorem|Data compression}} Shannon's definition of entropy, when applied to an information source, can determine the minimum channel capacity required to reliably transmit the source as encoded binary digits. Shannon's entropy measures the information contained in a message as opposed to the portion of the message that is determined (or predictable). Examples of the latter include redundancy in language structure or statistical properties relating to the occurrence frequencies of letter or word pairs, triplets etc. The minimum channel capacity can be realized in theory by using the [[typical set]] or in practice using [[Huffman coding|Huffman]], [[LZW|Lempel–Ziv]] or [[arithmetic coding]]. (See also [[Kolmogorov complexity]].) In practice, compression algorithms deliberately include some judicious redundancy in the form of [[checksum]]s to protect against errors. The [[entropy rate]] of a data source is the average number of bits per symbol needed to encode it. Shannon's experiments with human predictors show an information rate between 0.6 and 1.3 bits per character in English;<ref>{{cite web |url=http://marknelson.us/2006/08/24/the-hutter-prize/ |title=The Hutter Prize |access-date=2008-11-27 |date=24 August 2006 |author=Mark Nelson |archive-date=1 March 2018 |archive-url=https://web.archive.org/web/20180301161215/http://marknelson.us/2006/08/24/the-hutter-prize/ |url-status=dead }}</ref> the [[PPM compression algorithm]] can achieve a compression ratio of 1.5 bits per character in English text. If a [[Data compression|compression]] scheme is lossless – one in which you can always recover the entire original message by decompression – then a compressed message has the same quantity of information as the original but is communicated in fewer characters. It has more information (higher entropy) per character. A compressed message has less [[redundancy (information theory)|redundancy]]. [[Shannon's source coding theorem]] states a lossless compression scheme cannot compress messages, on average, to have ''more'' than one bit of information per bit of message, but that any value ''less'' than one bit of information per bit of message can be attained by employing a suitable coding scheme. The entropy of a message per bit multiplied by the length of that message is a measure of how much total information the message contains. Shannon's theorem also implies that no lossless compression scheme can shorten ''all'' messages. If some messages come out shorter, at least one must come out longer due to the [[pigeonhole principle]]. In practical use, this is generally not a problem, because one is usually only interested in compressing certain types of messages, such as a document in English, as opposed to gibberish text, or digital photographs rather than noise, and it is unimportant if a compression algorithm makes some unlikely or uninteresting sequences larger. A 2011 study in ''[[Science (journal)|Science]]'' estimates the world's technological capacity to store and communicate optimally compressed information normalized on the most effective compression algorithms available in the year 2007, therefore estimating the entropy of the technologically available sources.<ref name="HilbertLopez2011">[http://www.sciencemag.org/content/332/6025/60 "The World's Technological Capacity to Store, Communicate, and Compute Information"] {{Webarchive|url=https://web.archive.org/web/20130727161911/http://www.sciencemag.org/content/332/6025/60 |date=27 July 2013 }}, Martin Hilbert and Priscila López (2011), ''[[Science (journal)|Science]]'', 332(6025); free access to the article through here: martinhilbert.net/WorldInfoCapacity.html</ref>{{rp|pp=60–65}} {| class="wikitable" |+ All figures in entropically compressed [[exabytes]] |- ! Type of Information !! 1986 !! 2007 |- | Storage || 2.6 || 295 |- | Broadcast || 432 || 1900 |- | Telecommunications || 0.281 || 65 |} The authors estimate humankind technological capacity to store information (fully entropically compressed) in 1986 and again in 2007. They break the information into three categories—to store information on a medium, to receive information through one-way [[broadcast]] networks, or to exchange information through two-way [[telecommunications network]]s.<ref name="HilbertLopez2011"/>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Entropy (information theory)
(section)
Add topic