Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Sufficient statistic
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Examples== ===Bernoulli distribution=== If ''X''<sub>1</sub>, ...., ''X''<sub>''n''</sub> are independent [[Bernoulli trial|Bernoulli-distributed]] random variables with expected value ''p'', then the sum ''T''(''X'') = ''X''<sub>1</sub> + ... + ''X''<sub>''n''</sub> is a sufficient statistic for ''p'' (here 'success' corresponds to ''X''<sub>''i''</sub> = 1 and 'failure' to ''X''<sub>''i''</sub> = 0; so ''T'' is the total number of successes) This is seen by considering the joint probability distribution: :<math> \Pr\{X=x\}=\Pr\{X_1=x_1,X_2=x_2,\ldots,X_n=x_n\}.</math> Because the observations are independent, this can be written as :<math> p^{x_1}(1-p)^{1-x_1} p^{x_2}(1-p)^{1-x_2}\cdots p^{x_n}(1-p)^{1-x_n} </math> and, collecting powers of ''p'' and 1 − ''p'', gives :<math> p^{\sum x_i}(1-p)^{n-\sum x_i}=p^{T(x)}(1-p)^{n-T(x)} </math> which satisfies the factorization criterion, with ''h''(''x'') = 1 being just a constant. Note the crucial feature: the unknown parameter ''p'' interacts with the data ''x'' only via the statistic ''T''(''x'') = Σ ''x''<sub>''i''</sub>. As a concrete application, this gives a procedure for distinguishing a [[Fair coin#Fair results from a biased coin|fair coin from a biased coin]]. ===Uniform distribution=== {{see also|German tank problem}} If ''X''<sub>1</sub>, ...., ''X''<sub>''n''</sub> are independent and [[uniform distribution (continuous)|uniformly distributed]] on the interval [0,''θ''], then ''T''(''X'') = max(''X''<sub>1</sub>, ..., ''X''<sub>''n''</sub>) is sufficient for θ — the [[sample maximum]] is a sufficient statistic for the population maximum. To see this, consider the joint [[probability density function]] of ''X'' (''X''<sub>1</sub>,...,''X''<sub>''n''</sub>). Because the observations are independent, the pdf can be written as a product of individual densities :<math>\begin{align} f_{\theta}(x_1,\ldots,x_n) &= \frac{1}{\theta}\mathbf{1}_{\{0\leq x_1\leq\theta\}} \cdots \frac{1}{\theta}\mathbf{1}_{\{0\leq x_n\leq\theta\}} \\[5pt] &= \frac{1}{\theta^n} \mathbf{1}_{\{0\leq\min\{x_i\}\}}\mathbf{1}_{\{\max\{x_i\}\leq\theta\}} \end{align}</math> where '''1'''<sub>{''...''}</sub> is the [[indicator function]]. Thus the density takes form required by the Fisher–Neyman factorization theorem, where ''h''(''x'') = '''1'''<sub>{min{''x<sub>i</sub>''}≥0}</sub>, and the rest of the expression is a function of only ''θ'' and ''T''(''x'') = max{''x<sub>i</sub>''}. In fact, the [[minimum-variance unbiased estimator]] (MVUE) for ''θ'' is :<math> \frac{n+1}{n}T(X). </math> This is the sample maximum, scaled to correct for the [[bias of an estimator|bias]], and is MVUE by the [[Lehmann–Scheffé theorem]]. Unscaled sample maximum ''T''(''X'') is the [[maximum likelihood estimator]] for ''θ''. ===Uniform distribution (with two parameters)=== If <math>X_1,...,X_n</math> are independent and [[Uniform distribution (continuous)|uniformly distributed]] on the interval <math>[\alpha, \beta]</math> (where <math>\alpha</math> and <math>\beta</math> are unknown parameters), then <math>T(X_1^n)=\left(\min_{1 \leq i \leq n}X_i,\max_{1 \leq i \leq n}X_i\right)</math> is a two-dimensional sufficient statistic for <math>(\alpha\, , \, \beta)</math>. To see this, consider the joint [[probability density function]] of <math>X_1^n=(X_1,\ldots,X_n)</math>. Because the observations are independent, the pdf can be written as a product of individual densities, i.e. :<math>\begin{align} f_{X_1^n}(x_1^n) &= \prod_{i=1}^n \left({1 \over \beta-\alpha}\right) \mathbf{1}_{ \{ \alpha \leq x_i \leq \beta \} } = \left({1 \over \beta-\alpha}\right)^n \mathbf{1}_{ \{ \alpha \leq x_i \leq \beta, \, \forall \, i = 1,\ldots,n\}} \\ &= \left({1 \over \beta-\alpha}\right)^n \mathbf{1}_{ \{ \alpha \, \leq \, \min_{1 \leq i \leq n}X_i \} } \mathbf{1}_{ \{ \max_{1 \leq i \leq n}X_i \, \leq \, \beta \} }. \end{align}</math> The joint density of the sample takes the form required by the Fisher–Neyman factorization theorem, by letting :<math>\begin{align} h(x_1^n)= 1, \quad g_{(\alpha, \beta)}(x_1^n)= \left({1 \over \beta-\alpha}\right)^n \mathbf{1}_{ \{ \alpha \, \leq \, \min_{1 \leq i \leq n}X_i \} } \mathbf{1}_{ \{ \max_{1 \leq i \leq n}X_i \, \leq \, \beta \} }. \end{align}</math> Since <math>h(x_1^n)</math> does not depend on the parameter <math>(\alpha, \beta)</math> and <math>g_{(\alpha \, , \, \beta)}(x_1^n)</math> depends only on <math>x_1^n</math> through the function <math>T(X_1^n)= \left(\min_{1 \leq i \leq n}X_i,\max_{1 \leq i \leq n}X_i\right),</math> the Fisher–Neyman factorization theorem implies <math>T(X_1^n) = \left(\min_{1 \leq i \leq n}X_i,\max_{1 \leq i \leq n}X_i\right)</math> is a sufficient statistic for <math>(\alpha\, , \, \beta)</math>. ===Poisson distribution=== If ''X''<sub>1</sub>, ...., ''X''<sub>''n''</sub> are independent and have a [[Poisson distribution]] with parameter ''λ'', then the sum ''T''(''X'') = ''X''<sub>1</sub> + ... + ''X''<sub>''n''</sub> is a sufficient statistic for ''λ''. To see this, consider the joint probability distribution: :<math> \Pr(X=x)=P(X_1=x_1,X_2=x_2,\ldots,X_n=x_n). </math> Because the observations are independent, this can be written as :<math> {e^{-\lambda} \lambda^{x_1} \over x_1 !} \cdot {e^{-\lambda} \lambda^{x_2} \over x_2 !} \cdots {e^{-\lambda} \lambda^{x_n} \over x_n !} </math> which may be written as :<math> e^{-n\lambda} \lambda^{(x_1+x_2+\cdots+x_n)} \cdot {1 \over x_1 ! x_2 !\cdots x_n ! } </math> which shows that the factorization criterion is satisfied, where ''h''(''x'') is the reciprocal of the product of the factorials. Note the parameter λ interacts with the data only through its sum ''T''(''X''). ===Normal distribution=== If <math>X_1,\ldots,X_n</math> are independent and [[Normal distribution|normally distributed]] with expected value <math>\theta</math> (a parameter) and known finite variance <math>\sigma^2,</math> then :<math>T(X_1^n)=\overline{x}=\frac1n\sum_{i=1}^nX_i</math> is a sufficient statistic for <math>\theta.</math> To see this, consider the joint [[probability density function]] of <math>X_1^n=(X_1,\dots,X_n)</math>. Because the observations are independent, the pdf can be written as a product of individual densities, i.e. :<math>\begin{align} f_{X_1^n}(x_1^n) & = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left (-\frac{(x_i-\theta)^2}{2\sigma^2} \right ) \\ [6pt] &= (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left ( -\sum_{i=1}^n \frac{(x_i-\theta)^2}{2\sigma^2} \right ) \\ [6pt] & = (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left (-\sum_{i=1}^n \frac{ \left ( \left (x_i-\overline{x} \right ) - \left (\theta-\overline{x} \right ) \right )^2}{2\sigma^2} \right ) \\ [6pt] & = (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left( -{1\over2\sigma^2} \left(\sum_{i=1}^n(x_i-\overline{x})^2 + \sum_{i=1}^n(\theta-\overline{x})^2 -2\sum_{i=1}^n(x_i-\overline{x})(\theta-\overline{x})\right) \right) \\ [6pt] &= (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left( -{1\over2\sigma^2} \left (\sum_{i=1}^n(x_i-\overline{x})^2 + n(\theta-\overline{x})^2 \right ) \right ) && \sum_{i=1}^n(x_i-\overline{x})(\theta-\overline{x})=0 \\ [6pt] &= (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left( -{1\over2\sigma^2} \sum_{i=1}^n (x_i-\overline{x})^2 \right ) \exp \left (-\frac{n}{2\sigma^2} (\theta-\overline{x})^2 \right ) \end{align}</math> The joint density of the sample takes the form required by the Fisher–Neyman factorization theorem, by letting :<math>\begin{align} h(x_1^n) &= (2\pi\sigma^2)^{-\frac{n}{2}} \exp \left( -{1\over2\sigma^2} \sum_{i=1}^n (x_i-\overline{x})^2 \right ) \\[6pt] g_\theta(x_1^n) &= \exp \left (-\frac{n}{2\sigma^2} (\theta-\overline{x})^2 \right ) \end{align}</math> Since <math>h(x_1^n)</math> does not depend on the parameter <math>\theta</math> and <math>g_{\theta}(x_1^n)</math> depends only on <math>x_1^n</math> through the function :<math>T(X_1^n)=\overline{x}=\frac1n\sum_{i=1}^nX_i,</math> the Fisher–Neyman factorization theorem implies <math>T(X_1^n)</math> is a sufficient statistic for <math>\theta</math>. If <math> \sigma^2 </math> is unknown and since <math>s^2 = \frac{1}{n-1} \sum_{i=1}^n \left(x_i - \overline{x} \right)^2 </math>, the above likelihood can be rewritten as :<math>\begin{align} f_{X_1^n}(x_1^n)= (2\pi\sigma^2)^{-n/2} \exp \left( -\frac{n-1}{2\sigma^2}s^2 \right) \exp \left (-\frac{n}{2\sigma^2} (\theta-\overline{x})^2 \right ) . \end{align}</math> The Fisher–Neyman factorization theorem still holds and implies that <math>(\overline{x},s^2)</math> is a joint sufficient statistic for <math> ( \theta , \sigma^2) </math>. ===Exponential distribution=== If <math>X_1,\dots,X_n</math> are independent and [[Exponential distribution|exponentially distributed]] with expected value ''θ'' (an unknown real-valued positive parameter), then <math>T(X_1^n)=\sum_{i=1}^nX_i</math> is a sufficient statistic for θ. To see this, consider the joint [[probability density function]] of <math>X_1^n=(X_1,\dots,X_n)</math>. Because the observations are independent, the pdf can be written as a product of individual densities, i.e. :<math>\begin{align} f_{X_1^n}(x_1^n) &= \prod_{i=1}^n {1 \over \theta} \, e^{ {-1 \over \theta}x_i } = {1 \over \theta^n}\, e^{ {-1 \over \theta} \sum_{i=1}^nx_i }. \end{align}</math> The joint density of the sample takes the form required by the Fisher–Neyman factorization theorem, by letting :<math>\begin{align} h(x_1^n)= 1,\,\,\, g_{\theta}(x_1^n)= {1 \over \theta^n}\, e^{ {-1 \over \theta} \sum_{i=1}^nx_i }. \end{align}</math> Since <math>h(x_1^n)</math> does not depend on the parameter <math>\theta</math> and <math>g_{\theta}(x_1^n)</math> depends only on <math>x_1^n</math> through the function <math>T(X_1^n)=\sum_{i=1}^nX_i</math> the Fisher–Neyman factorization theorem implies <math>T(X_1^n)=\sum_{i=1}^nX_i</math> is a sufficient statistic for <math>\theta</math>. ===Gamma distribution=== If <math>X_1,\dots,X_n</math> are independent and distributed as a [[Gamma distribution|<math>\Gamma(\alpha \, , \, \beta) </math>]], where <math>\alpha</math> and <math>\beta</math> are unknown parameters of a [[Gamma distribution]], then <math>T(X_1^n) = \left( \prod_{i=1}^n{X_i} , \sum_{i=1}^n X_i \right)</math> is a two-dimensional sufficient statistic for <math>(\alpha, \beta)</math>. To see this, consider the joint [[probability density function]] of <math>X_1^n=(X_1,\dots,X_n)</math>. Because the observations are independent, the pdf can be written as a product of individual densities, i.e. :<math>\begin{align} f_{X_1^n}(x_1^n) &= \prod_{i=1}^n \left({1 \over \Gamma(\alpha) \beta^\alpha}\right) x_i^{\alpha -1} e^{(-1/\beta)x_i} \\[5pt] &= \left({1 \over \Gamma(\alpha) \beta^\alpha}\right)^n \left(\prod_{i=1}^n x_i\right)^{\alpha-1} e^{{-1 \over \beta} \sum_{i=1}^n x_i}. \end{align}</math> The joint density of the sample takes the form required by the Fisher–Neyman factorization theorem, by letting :<math>\begin{align} h(x_1^n)= 1,\,\,\, g_{(\alpha \, , \, \beta)}(x_1^n)= \left({1 \over \Gamma(\alpha) \beta^{\alpha}}\right)^n \left(\prod_{i=1}^n x_i\right)^{\alpha-1} e^{{-1 \over \beta} \sum_{i=1}^n x_i}. \end{align}</math> Since <math>h(x_1^n)</math> does not depend on the parameter <math>(\alpha\, , \, \beta)</math> and <math>g_{(\alpha \, , \, \beta)}(x_1^n)</math> depends only on <math>x_1^n</math> through the function <math>T(x_1^n)= \left( \prod_{i=1}^n x_i, \sum_{i=1}^n x_i \right),</math> the Fisher–Neyman factorization theorem implies <math>T(X_1^n)= \left( \prod_{i=1}^n X_i, \sum_{i=1}^n X_i \right)</math> is a sufficient statistic for <math>(\alpha\, , \, \beta).</math>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Sufficient statistic
(section)
Add topic