Editing Median (section)

==Medians for samples==
{{hatnote|This section discusses the theory of  estimating a population median from a sample.  To calculate the median of a sample "by hand," see {{slink||Finite data set of numbers}} above.}}

==={{anchor|Ninther}} {{anchor|Remedian}} Efficient computation of the sample median===
Even though [[sorting algorithm|comparison-sorting]] ''n'' items requires {{math|[[Big O notation|Ω]](''n'' log ''n'')}} operations, [[selection algorithm]]s can compute the [[order statistic|{{mvar|k}}th-smallest of {{mvar|n}} items]] with only {{math|Θ(''n'')}} operations. This includes the median, which is the {{math|{{sfrac|''n''|2}}}}th order statistic (or for an even number of samples, the [[arithmetic mean]] of the two middle order statistics).<ref>{{cite book | isbn=0-201-00029-6 | author=Alfred V. Aho and John E. Hopcroft and Jeffrey D. Ullman | title=The Design and Analysis of Computer Algorithms | location=Reading/MA | publisher=Addison-Wesley | year=1974 | url-access=registration | url=https://archive.org/details/designanalysisof00ahoarich }} Here: Section 3.6 "Order Statistics", p.97-99, in particular Algorithm 3.6 and Theorem 3.9.</ref>

Selection algorithms still have the downside of requiring {{math|Ω(''n'')}} memory, that is, they need to have the full sample (or a linear-sized portion of it) in memory. Because this, as well as the linear time requirement, can be prohibitive, several estimation procedures for the median have been developed. A simple one is the median of three rule, which estimates the median as the median of a three-element subsample; this is commonly used as a subroutine in the [[quicksort]] sorting algorithm, which uses an estimate of its input's median. A more [[robust estimator]] is [[John Tukey|Tukey]]'s ''ninther'', which is the median of three rule applied with limited recursion:<ref>{{cite journal |first1=Jon L. |last1=Bentley |first2=M. Douglas |last2=McIlroy |title=Engineering a sort function |journal=Software: Practice and Experience |volume=23 |issue=11 |pages=1249–1265 |year=1993 |url=http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.8162 |doi=10.1002/spe.4380231105|s2cid=8822797 }}</ref> if {{mvar|A}} is the sample laid out as an [[array (data structure)|array]], and

{{block indent | em = 1.5 | text = {{math|1=med3(''A'') = med(''A''[1], ''A''[{{sfrac|''n''|2}}], ''A''[''n''])}},}}

then

{{block indent | em = 1.5 | text = {{math|1=ninther(''A'') = med3(med3(''A''[1 ... {{sfrac|1|3}}''n'']), med3(''A''[{{sfrac|1|3}}''n'' ... {{sfrac|2|3}}''n'']), med3(''A''[{{sfrac|2|3}}''n'' ... ''n'']))}}}}

The ''remedian'' is an estimator for the median that requires linear time but sub-linear memory, operating in a single pass over the sample.<ref>{{cite journal |last1=Rousseeuw |first1=Peter J. |last2=Bassett |first2=Gilbert W. Jr.|title=The remedian: a robust averaging method for large data sets |journal=J. Amer. Statist. Assoc. |volume=85 |issue=409 |year=1990 |url=http://wis.kuleuven.be/stat/robust/papers/publications-1990/rousseeuwbassett-remedian-jasa-1990.pdf |pages=97–104 |doi=10.1080/01621459.1990.10475311}}</ref>

===Sampling distribution===

The distributions of both the sample mean and the sample median were determined by [[Pierre-Simon Laplace|Laplace]].<ref name=Stigler1973>{{cite journal
 | last = Stigler
 | first = Stephen
 | author-link = Stephen Stigler
 |date= December 1973
 | title = Studies in the History of Probability and Statistics. XXXII: Laplace, Fisher and the Discovery of the Concept of Sufficiency
 | journal = Biometrika
 | volume = 60
 | issue = 3
 | pages = 439–445
 | doi = 10.1093/biomet/60.3.439
 | mr = 0326872 | jstor = 2334992
 }}</ref> The distribution of the sample median from a population with a density function <math>f(x)</math> is asymptotically normal with mean <math>\mu</math> and variance<ref name="Rider1960">{{cite journal |last=Rider |first=Paul R. |year=1960 |title=Variance of the median of small samples from several special populations |journal=[[Journal of the American Statistical Association|J. Amer. Statist. Assoc.]] |volume=55 |issue=289 |pages=148–150 |doi=10.1080/01621459.1960.10482056 }}</ref>

<math display="block"> \frac{ 1 }{ 4n f( m )^2 }</math>

where <math>m</math> is the median of <math>f(x)</math> and <math>n</math> is the sample size:

<math display="block">\text{Sample median} \sim \mathcal{N}{\left(\mu{=}m, \, \sigma^2{=}\frac{1}{ 4n f(m)^2}\right)}</math>

A modern proof follows below. Laplace's result is now understood as a special case of [[Quantile#Estimating quantiles from a sample|the asymptotic distribution of arbitrary quantiles]].

For normal samples, the density is <math>f(m) = 1 / \sqrt{2\pi\sigma^2}</math>, thus for large samples the variance of the median equals <math>({\pi}/{2}) \cdot(\sigma^2/n).</math><ref name="Williams 2001 165"/>  (See also section [[#Efficiency]] below.)

==== Derivation of the asymptotic distribution ====
{{unsourced section|date=November 2023}}

We take the sample size to be an odd number <math> N = 2n + 1 </math> and assume our variable continuous; the formula for the case of discrete variables is given below in {{slink||Empirical local density}}.  The sample can be summarized as "below median", "at median", and "above median", which corresponds to a trinomial distribution with probabilities  <math> F(v) </math>, <math> f(v) </math> and  <math> 1 - F(v) </math>.  For a continuous variable, the probability of multiple sample values being exactly equal to the median is 0, so one can calculate the density of at the point <math> v </math> directly from the trinomial distribution:

<math display="block"> \Pr[\operatorname{med}=v] \, dv = \frac{(2n+1)!}{n!n!} F(v)^n (1 - F(v))^n f(v)\, dv.</math>

Now we introduce the beta function.  For integer arguments <math> \alpha </math> and <math> \beta </math>, this can be expressed as <math> \Beta(\alpha,\beta) = \frac{(\alpha - 1)! (\beta - 1)!}{(\alpha + \beta - 1)!} </math>.  Also, recall that <math> f(v)\,dv = dF(v) </math>.  Using these relationships and setting both <math> \alpha </math> and <math> \beta </math> equal to  <math>n+1</math> allows the last expression to be written as

<math display="block"> \frac{F(v)^n(1 - F(v))^n}{\Beta(n+1,n+1)} \, dF(v) </math>

Hence the density function of the median is a symmetric beta distribution [[pushforward measure|pushed forward]] by <math>F</math>.  Its mean, as we would expect, is 0.5 and its variance is  <math> 1/(4(N+2)) </math>.  By the [[chain rule]], the corresponding variance of the sample median is

<math display="block">\frac{ 1 }{ 4(N + 2) f(m)^2 }.</math>

The additional 2 is negligible [[limit (mathematics)|in the limit]].

=====Empirical local density=====
In practice, the functions <math> f </math> and <math> F </math> above are often not known or assumed.  However, they can be estimated from an observed frequency distribution.  In this section, we give an example.  Consider the following table, representing a sample of 3,800 (discrete-valued) observations:

{| class="wikitable" style="margin-left:auto;margin-right:auto;text-align:center;"
! {{mvar|v}} !! 0 !! 0.5 !! 1 !! 1.5 !! 2 !! 2.5 !! 3 !! 3.5 !! 4 !! 4.5 !! 5
|-
! {{math|''f''(''v'')}}
| 0.000 || 0.008 || 0.010 || 0.013 || 0.083 || 0.108 || 0.328 || 0.220 ||  0.202 || 0.023 || 0.005
|-
! {{math|''F''(''v'')}}
| 0.000 || 0.008 || 0.018 || 0.031 || 0.114 || 0.222 || 0.550 || 0.770 ||  0.972 || 0.995 || 1.000
|}

Because the observations are discrete-valued, constructing the exact distribution of the median is not an immediate translation of the above expression for <math> \Pr(\operatorname{med} = v) </math>; one may (and typically does) have multiple instances of the median in one's sample.  So we must sum over all these possibilities:

<math display="block"> \Pr(\operatorname{med} = v) = \sum_{i=0}^n \sum_{k=0}^n \frac{N!}{i!(N-i-k)!k!} F(v-1)^i(1 - F(v))^kf(v)^{N-i-k} </math>

Here, ''i'' is the number of points strictly less than the median and ''k'' the number strictly greater.

Using these preliminaries, it is possible to investigate the effect of sample size on the standard errors of the mean and median.  The observed mean is 3.16, the observed raw median is 3 and the observed interpolated median is 3.174.  The following table gives some comparison statistics.

{| class="wikitable" style="margin-left:auto;margin-right:auto;text-align:center;"
! {{Diagonal split header|Statistic|Sample size}} !! 3 !! 9 !! 15 !! 21
|-
! Expected value of median
| 3.198 || 3.191 || 3.174 || 3.161
|-
! Standard error of median (above formula)
| 0.482 || 0.305 || 0.257 || 0.239
|-
! Standard error of median (asymptotic approximation)
| 0.879 || 0.508 || 0.393 || 0.332
|-
! Standard error of mean
| 0.421 || 0.243 || 0.188 || 0.159
|}
The expected value of the median falls slightly as sample size increases while, as would be expected, the standard errors of both the median and the mean are proportionate to the inverse square root of the sample size.  The asymptotic approximation errs on the side of caution by overestimating the standard error.

=== Estimation of variance from sample data ===

The value of <math>(2 f(x))^{-2}</math>—the asymptotic value of <math>n^{-1/2} (\nu - m)</math> where <math>\nu</math> is the population median—has been studied by several authors. The standard "delete one" [[Resampling (statistics)#Jackknife|jackknife]] method produces [[consistent estimator|inconsistent]] results.<ref name=Efron1982>{{cite book |last=Efron |first=B. |year=1982 |title=The Jackknife, the Bootstrap and other Resampling Plans |publisher=SIAM |location=Philadelphia |isbn=0898711797 }}</ref> An alternative—the "delete k" method—where <math>k</math> grows with the sample size has been shown to be asymptotically consistent.<ref name=Shao1989>{{cite journal |last1=Shao |first1=J. |last2=Wu |first2=C. F. |year=1989 |title=A General Theory for Jackknife Variance Estimation |journal=[[Annals of Statistics|Ann. Stat.]] |volume=17 |issue=3 |pages=1176–1197 |jstor=2241717 |doi=10.1214/aos/1176347263|doi-access=free }}</ref> This method may be computationally expensive for large data sets. A bootstrap estimate is known to be consistent,<ref name=Efron1979>{{cite journal |last=Efron |first=B. |year=1979 |title=Bootstrap Methods: Another Look at the Jackknife |journal=[[Annals of Statistics|Ann. Stat.]] |volume=7 |issue=1 |pages=1–26 |jstor=2958830 |doi=10.1214/aos/1176344552|doi-access=free }}</ref> but converges very slowly ([[computational complexity theory|order]] of <math>n^{-\frac{1}{4}}</math>).<ref name=Hall1988>{{cite journal |last1=Hall |first1=P. |last2=Martin |first2=M. A. |s2cid=119701556 |year=1988 |title=Exact Convergence Rate of Bootstrap Quantile Variance Estimator |journal=Probab Theory Related Fields |volume=80 |issue=2 |pages=261–268 |doi=10.1007/BF00356105 |doi-access=free }}</ref> Other methods have been proposed but their behavior may differ between large and small samples.<ref name=Jimenez-Gamero2004>{{cite journal |last1=Jiménez-Gamero |first1=M. D. |last2=Munoz-García |first2=J. |first3=R. |last3=Pino-Mejías |year=2004 |title=Reduced bootstrap for the median |journal=Statistica Sinica |volume=14 |issue=4 |pages=1179–1198 |url=http://www3.stat.sinica.edu.tw/statistica/password.asp?vol=14&num=4&art=11 }}</ref>

===Efficiency{{anchor|Efficiency}}===

The [[Efficiency (statistics)|efficiency]] of the sample median, measured as the ratio of the variance of the mean to the variance of the median, depends on the sample size and on the underlying population distribution. For a sample of size <math>N = 2n + 1</math> from the [[normal distribution]], the efficiency for large N is

<math display="block"> \frac{2}{\pi} \frac{N+2}{N} </math>

The efficiency tends to <math> \frac{2}{\pi} </math> as <math>N</math> tends to infinity.

In other words, the relative variance of the median will be <math>\pi/2 \approx 1.57</math>, or 57% greater than the variance of the mean – the relative [[standard error]] of the median will be <math>(\pi/2)^\frac{1}{2} \approx 1.25</math>, or 25% greater than the [[standard error of the mean]], <math>\sigma/\sqrt{n}</math> (see also section [[#Sampling distribution]] above.).<ref>{{Cite book | url=https://books.google.com/books?id=8bMj8m-4RDQC&q=standard%20error%20of%20the%20median&pg=PA104 |title = Data Analysis and Graphics Using R: An Example-Based Approach|isbn = 9781139486675|last1 = Maindonald|first1 = John|last2 = John Braun|first2 = W.|date = 2010-05-06| publisher=Cambridge University Press }}</ref>

===Other estimators===
For univariate distributions that are ''symmetric'' about one median, the [[Hodges–Lehmann estimator]] is a [[robust statistics|robust]] and highly [[Efficiency (statistics)|efficient estimator]] of the population median.<ref name="HM">{{cite book |last1=Hettmansperger |first1=Thomas P. |last2=McKean |first2=Joseph W. |title=Robust nonparametric statistical methods |series=Kendall's Library of Statistics |volume=5 |publisher=Edward Arnold |location=London |year=1998 |isbn=0-340-54937-8 |mr=1604954 }}</ref>

If data is represented by a [[statistical model]] specifying a particular family of [[probability distribution]]s, then estimates of the median can be obtained by fitting that family of probability distributions to the data and calculating the theoretical median of the fitted distribution. [[Pareto interpolation]] is an application of this when the population is assumed to have a [[Pareto distribution]].