Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Entropy (information theory)
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Entropy for continuous random variables== ===Differential entropy=== {{Main|Differential entropy}} The Shannon entropy is restricted to random variables taking discrete values. The corresponding formula for a continuous random variable with [[probability density function]] {{math|''f''(''x'')}} with finite or infinite support <math>\mathbb X</math> on the real line is defined by analogy, using the above form of the entropy as an expectation:<ref name=cover1991/>{{rp|p=224}} <math display="block">\Eta(X) = \mathbb{E}[-\log f(X)] = -\int_\mathbb X f(x) \log f(x)\, \mathrm{d}x.</math> This is the differential entropy (or continuous entropy). A precursor of the continuous entropy {{math|''h''[''f'']}} is the expression for the functional {{math|''ฮ''}} in the [[H-theorem]] of Boltzmann. Although the analogy between both functions is suggestive, the following question must be set: is the differential entropy a valid extension of the Shannon discrete entropy? Differential entropy lacks a number of properties that the Shannon discrete entropy has โ it can even be negative โ and corrections have been suggested, notably [[limiting density of discrete points]]. To answer this question, a connection must be established between the two functions: In order to obtain a generally finite measure as the [[bin size]] goes to zero. In the discrete case, the bin size is the (implicit) width of each of the {{math|''n''}} (finite or infinite) bins whose probabilities are denoted by {{math|''p''<sub>''n''</sub>}}. As the continuous domain is generalized, the width must be made explicit. To do this, start with a continuous function {{math|''f''}} discretized into bins of size <math>\Delta</math>. <!-- Figure: Discretizing the function $ f$ into bins of width $ \Delta$ \includegraphics[width=\textwidth]{function-with-bins.eps} --><!-- The original article this figure came from is at http://planetmath.org/shannonsentropy but it is broken there too --> By the mean-value theorem there exists a value {{math|''x''<sub>''i''</sub>}} in each bin such that <math display="block">f(x_i) \Delta = \int_{i\Delta}^{(i+1)\Delta} f(x)\, dx</math> the integral of the function {{math|''f''}} can be approximated (in the Riemannian sense) by <math display="block">\int_{-\infty}^{\infty} f(x)\, dx = \lim_{\Delta \to 0} \sum_{i = -\infty}^{\infty} f(x_i) \Delta ,</math> where this limit and "bin size goes to zero" are equivalent. We will denote <math display="block">\Eta^{\Delta} := - \sum_{i=-\infty}^{\infty} f(x_i) \Delta \log \left( f(x_i) \Delta \right)</math> and expanding the logarithm, we have <math display="block">\Eta^{\Delta} = - \sum_{i=-\infty}^{\infty} f(x_i) \Delta \log (f(x_i)) -\sum_{i=-\infty}^{\infty} f(x_i) \Delta \log (\Delta).</math> As {{math|ฮ โ 0}}, we have <math display="block">\begin{align} \sum_{i=-\infty}^{\infty} f(x_i) \Delta &\to \int_{-\infty}^{\infty} f(x)\, dx = 1 \\ \sum_{i=-\infty}^{\infty} f(x_i) \Delta \log (f(x_i)) &\to \int_{-\infty}^{\infty} f(x) \log f(x)\, dx. \end{align}</math> Note; {{math|log(ฮ) โ โโ}} as {{math|ฮ โ 0}}, requires a special definition of the differential or continuous entropy: <math display="block">h[f] = \lim_{\Delta \to 0} \left(\Eta^{\Delta} + \log \Delta\right) = -\int_{-\infty}^{\infty} f(x) \log f(x)\,dx,</math> which is, as said before, referred to as the differential entropy. This means that the differential entropy ''is not'' a limit of the Shannon entropy for {{math|''n'' โ โ}}. Rather, it differs from the limit of the Shannon entropy by an infinite offset (see also the article on [[information dimension]]). ===Limiting density of discrete points=== {{Main|Limiting density of discrete points}} It turns out as a result that, unlike the Shannon entropy, the differential entropy is ''not'' in general a good measure of uncertainty or information. For example, the differential entropy can be negative; also it is not invariant under continuous co-ordinate transformations. This problem may be illustrated by a change of units when {{math|''x''}} is a dimensioned variable. {{math|''f''(''x'')}} will then have the units of {{math|1/''x''}}. The argument of the logarithm must be dimensionless, otherwise it is improper, so that the differential entropy as given above will be improper. If {{math|''Δ''}} is some "standard" value of {{math|''x''}} (i.e. "bin size") and therefore has the same units, then a modified differential entropy may be written in proper form as: <math display="block" display="block">\Eta=\int_{-\infty}^\infty f(x) \log(f(x)\,\Delta)\,dx ,</math> and the result will be the same for any choice of units for {{math|''x''}}. In fact, the limit of discrete entropy as <math> N \rightarrow \infty </math> would also include a term of <math> \log(N)</math>, which would in general be infinite. This is expected: continuous variables would typically have infinite entropy when discretized. The [[limiting density of discrete points]] is really a measure of how much easier a distribution is to describe than a distribution that is uniform over its quantization scheme. ===Relative entropy=== {{main|Generalized relative entropy}} Another useful measure of entropy that works equally well in the discrete and the continuous case is the '''relative entropy''' of a distribution. It is defined as the [[KullbackโLeibler divergence]] from the distribution to a reference measure {{math|''m''}} as follows. Assume that a probability distribution {{math|''p''}} is [[absolutely continuous]] with respect to a measure {{math|''m''}}, i.e. is of the form {{math|''p''(''dx'') {{=}} ''f''(''x'')''m''(''dx'')}} for some non-negative {{math|''m''}}-integrable function {{math|''f''}} with {{math|''m''}}-integral 1, then the relative entropy can be defined as <math display="block">D_{\mathrm{KL}}(p \| m ) = \int \log (f(x)) p(dx) = \int f(x)\log (f(x)) m(dx) .</math> In this form the relative entropy generalizes (up to change in sign) both the discrete entropy, where the measure {{math|''m''}} is the [[counting measure]], and the differential entropy, where the measure {{math|''m''}} is the [[Lebesgue measure]]. If the measure {{math|''m''}} is itself a probability distribution, the relative entropy is non-negative, and zero if {{math|''p'' {{=}} ''m''}} as measures. It is defined for any measure space, hence coordinate independent and invariant under co-ordinate reparameterizations if one properly takes into account the transformation of the measure {{math|''m''}}. The relative entropy, and (implicitly) entropy and differential entropy, do depend on the "reference" measure {{math|''m''}}.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Entropy (information theory)
(section)
Add topic