Editing Chebyshev's inequality (section)

=== Univariate case ===
Saw ''et al'' extended Chebyshev's inequality to cases where the population mean and variance are not known and may not exist, but the sample mean and sample standard deviation from ''N'' samples are to be employed to bound the expected value of a new drawing from the same distribution.<ref name=":1">{{cite journal
  |title = Chebyshev Inequality with Estimated Mean and Variance
  |last1 = Saw
  |first1 = John G.
  |last2 = Yang
  |first2 = Mark C. K.
  |last3 = Mo
  |first3 = Tse Chin
  |journal = [[The American Statistician]]
  |issn = 0003-1305
  |volume = 38
  |issue = 2
  |year = 1984
  |pages = 130–2
  |doi = 10.2307/2683249
  |jstor = 2683249
  }}</ref> The following simpler version of this inequality is given by Kabán.<ref name="Kabán2011">{{cite journal
  |last = Kabán
  |first = Ata
  |title = Non-parametric detection of meaningless distances in high dimensional data
  |journal = [[Statistics and Computing]]
  |volume = 22
  |issue = 2
  |pages = 375–85
  |year = 2012
  |doi = 10.1007/s11222-011-9229-0
|s2cid = 6018114
 }}</ref>

: <math>\Pr( | X - m | \ge ks ) \le \frac 1 {N + 1} \left\lfloor \frac {N+1} N \left(\frac{N - 1}{k^2} + 1 \right) \right\rfloor</math>

where ''X'' is a random variable which we have sampled ''N'' times, ''m'' is the sample mean, ''k'' is a constant and ''s'' is the sample standard deviation.

This inequality holds even when the population moments do not exist, and when the sample is only [[Exchangeable random variables|weakly exchangeably]] distributed; this criterion is met for randomised sampling. A table of values for the Saw–Yang–Mo inequality for finite sample sizes (''N'' < 100) has been determined by Konijn.<ref name=Konijn1987>{{cite journal |last=Konijn |first=Hendrik S. |title=Distribution-Free and Other Prediction Intervals |journal=[[The American Statistician]] |date=February 1987 |volume=41 |issue=1 |pages=11–15 |jstor=2684311 |doi=10.2307/2684311  }}</ref> The table allows the calculation of various confidence intervals for the mean, based on multiples, C, of the standard error of the mean as calculated from the sample. For example, Konijn shows that for ''N''&nbsp;=&nbsp;59, the 95 percent confidence interval for the mean ''m'' is {{nowrap|(''m'' − ''Cs'', ''m'' + ''Cs'')}} where {{nowrap|1=''C'' = 4.447 × 1.006 = 4.47}} (this is 2.28 times larger than the value found on the assumption of normality showing the loss on precision resulting from ignorance of the precise nature of the distribution).

An equivalent inequality can be derived in terms of the sample mean instead,<ref name="Kabán2011" />

: <math>\Pr( | X - m | \ge km ) \le \frac{N - 1} N \frac 1 {k^2} \frac{s^2}{m^2} + \frac 1 N.</math>

A table of values for the Saw–Yang–Mo inequality for finite sample sizes (''N'' < 100) has been determined by Konijn.<ref name="Konijn1987"/>

For fixed ''N'' and large ''m'' the Saw–Yang–Mo inequality is approximately<ref name=Beasley2004>{{cite journal |last1=Beasley |first1=T. Mark |last2=Page |first2=Grier P. |last3=Brand |first3=Jaap P. L. |last4=Gadbury |first4=Gary L. |last5=Mountz |first5=John D. |last6=Allison |first6=David B. |author-link6=David B. Allison |title=Chebyshev's inequality for nonparametric testing with small ''N'' and α in microarray research |journal=Journal of the Royal Statistical Society |issn=1467-9876 |date=January 2004 |volume=53 |series=C (Applied Statistics) |issue=1 |pages=95–108 |doi=10.1111/j.1467-9876.2004.00428.x |s2cid=122678278 |doi-access=free }}</ref>

: <math> \Pr( | X - m | \ge ks ) \le \frac 1 {N + 1}. </math>

Beasley ''et al'' have suggested a modification of this inequality<ref name=Beasley2004 />

: <math> \Pr( | X - m | \ge ks ) \le \frac 1 {k^2( N + 1 )}. </math>

In empirical testing this modification is conservative but appears to have low statistical power. Its theoretical basis currently remains unexplored.

====Dependence on sample size====
The bounds these inequalities give on a finite sample are less tight than those the Chebyshev inequality gives for a distribution. To illustrate this let the sample size ''N'' = 100 and let ''k'' = 3. Chebyshev's inequality states that at most approximately 11.11% of the distribution will lie at least three standard deviations away from the mean. Kabán's version of the inequality for a finite sample states that at most approximately 12.05% of the sample lies outside these limits. The dependence of the confidence intervals on sample size is further illustrated below.

For ''N'' = 10, the 95% confidence interval is approximately ±13.5789 standard deviations.

For ''N'' = 100 the 95% confidence interval is approximately ±4.9595 standard deviations; the 99% confidence interval is approximately ±140.0 standard deviations.

For ''N'' = 500 the 95% confidence interval is approximately ±4.5574 standard deviations; the 99% confidence interval is approximately ±11.1620 standard deviations.

For ''N'' = 1000 the 95% and 99% confidence intervals are approximately ±4.5141 and approximately ±10.5330 standard deviations respectively.

The Chebyshev inequality for the distribution gives 95% and 99% confidence intervals of approximately ±4.472 standard deviations and ±10 standard deviations respectively.

====Samuelson's inequality====
{{main|Samuelson's inequality}}
Although Chebyshev's inequality is the best possible bound for an arbitrary distribution, this is not necessarily true for finite samples. [[Samuelson's inequality]] states that all values of a sample must lie within  {{radic|''N''&nbsp;−&nbsp;1}} sample standard deviations of the mean.

By comparison, Chebyshev's inequality states that all but a ''1/N'' fraction of the sample will lie within {{radic|''N''}} standard deviations of the mean. Since there are ''N'' samples, this means that no samples will lie outside {{radic|''N''}} standard deviations of the mean, which is worse than Samuelson's inequality. However, the benefit of Chebyshev's inequality is that it can be applied more generally to get confidence bounds for ranges of standard deviations that do not depend on the number of samples.

====Semivariances====
An alternative method of obtaining sharper bounds is through the use of [[Variance#Semivariance|semivariance]]s (partial variances). The upper (''σ''<sub>+</sub><sup>2</sup>) and lower (''σ''<sub>−</sub><sup>2</sup>) semivariances are defined as

: <math> \sigma_+^2 = \frac { \sum_{x>m} (x - m)^2 } { n - 1 } ,</math>

: <math> \sigma_-^2 = \frac { \sum_{x<m} (m - x)^2 } { n - 1 }, </math>

where ''m'' is the arithmetic mean of the sample and ''n'' is the number of elements in the sample.

The variance of the sample is the sum of the two semivariances:

: <math> \sigma^2 = \sigma_+^2 + \sigma_-^2. </math>

In terms of the lower semivariance Chebyshev's inequality can be written<ref name=Berck1982>{{cite journal|author-link1=Peter Berck |last1=Berck |first1=Peter |last2=Hihn |first2=Jairus M. |title=Using the Semivariance to Estimate Safety-First Rules |journal=American Journal of Agricultural Economics |date=May 1982 |volume=64 |issue=2 |pages=298–300 |doi=10.2307/1241139 |issn=0002-9092|jstor=1241139 |doi-access= }}</ref>

: <math> \Pr(x \le m - a \sigma_-) \le \frac { 1 } { a^2 }.</math>

Putting

: <math> a = \frac{ k \sigma } { \sigma_- }. </math>

Chebyshev's inequality can now be written

: <math> \Pr(x \le m - k \sigma) \le \frac { 1 } { k^2 } \frac { \sigma_-^2 } { \sigma^2 }.</math>

A similar result can also be derived for the upper semivariance.

If we put

: <math> \sigma_u^2 = \max(\sigma_-^2, \sigma_+^2) , </math>

Chebyshev's inequality can be written

: <math> \Pr(| x \le m - k \sigma |) \le \frac 1 {k^2} \frac { \sigma_u^2 } { \sigma^2 } .</math>

Because ''σ''<sub>u</sub><sup>2</sup> ≤ ''σ''<sup>2</sup>, use of the semivariance sharpens the original inequality.

If the distribution is known to be symmetric, then

: <math> \sigma_+^2 = \sigma_-^2  = \frac{ 1 } { 2 } \sigma^2 </math>

and

: <math> \Pr(x \le m - k \sigma) \le \frac 1 {2k^2} .</math>

This result agrees with that derived using standardised variables.

;Note: The inequality with the lower semivariance has been found to be of use in estimating downside risk in finance and agriculture.<ref name="Berck1982"/><ref name=Nantell1979>{{cite journal |last1=Nantell |first1=Timothy J. |last2=Price |first2=Barbara |title=An Analytical Comparison of Variance and Semivariance Capital Market Theories |journal=[[The Journal of Financial and Quantitative Analysis]] |date=June 1979 |volume=14 |issue=2 |pages=221–42 |doi=10.2307/2330500 |jstor=2330500  |s2cid=154652959 }}</ref><ref name=Neave2008>{{cite journal
  |title = Distinguishing upside potential from downside risk
  |last1 = Neave
  |first1 = Edwin H.
  |last2 = Ross
  |first2 = Michael N.
  |last3 = Yang
  |first3 = Jun
  |journal = [[Management Research News]]
  |issn = 0140-9174
  |year = 2009
  |volume = 32
  |issue = 1
  |pages = 26–36
  |doi = 10.1108/01409170910922005
  }}</ref>