Editing Analysis of variance (section)

==Algorithm==
The calculations of ANOVA can be characterized as computing a number of means and variances, dividing two variances and comparing the ratio to a handbook value to determine statistical significance. Calculating a treatment effect is then trivial: "the effect of any treatment is estimated by taking the difference between the mean of the observations which receive the treatment and the general mean".<ref>Cochran & Cox (1992, p 49)</ref> 
[[File:Example of ANOVA table.jpg|380x366px|right|text-middle]]

===Partitioning of the sum of squares===
{{main|Partition of sums of squares}}
[[File:Example ANOVA Table.png|thumb|324x324px|One-factor ANOVA table showing example output data]]
{{see also|Lack-of-fit sum of squares}}
ANOVA uses traditional standardized terminology. The definitional equation of sample variance is <math display="inline">s^2 = \frac{1}{n-1} \sum_i (y_i-\bar{y})^2</math>, where the divisor is called the degrees of freedom (DF), the summation is called 
the sum of squares (SS), the result is called the mean square (MS) and the squared terms are deviations from the sample mean. ANOVA estimates 3 sample variances: a total variance based on all the observation deviations from the grand mean, an error variance based on all the observation deviations from their appropriate treatment means, and a treatment variance. The treatment variance is based on the deviations of treatment means from the grand mean, the result being multiplied by the number of observations in each treatment to account for the difference between the variance of observations and the variance of means.

The fundamental technique is a partitioning of the total [[sum of squares (statistics)|sum of squares]] ''SS'' into components related to the effects used in the model. For example, the model for a simplified ANOVA with one type of treatment at different levels.

<math display="block">SS_\text{Total} = SS_\text{Error} + SS_\text{Treatments}</math>

The number of [[Degrees of freedom (statistics)|degrees of freedom]] ''DF'' can be partitioned in a similar way: one of these components (that for error) specifies a [[chi-squared distribution]] which describes the associated sum of squares, while the same is true for "treatments" if there is no treatment effect.

<math display="block">DF_\text{Total} = DF_\text{Error} + DF_\text{Treatments}</math>

===The ''F''-test===
{{Main|F-test}}
[[File:F-Distribution Table.png|thumb|338x338px|To check for statistical significance of a one-way ANOVA, we consult the F-probability table using degrees of freedom at the {{Math|0.05}} alpha level. After computing the F-statistic, we compare the value at the intersection of each degrees of freedom, also known as the critical value. If one's F-statistic is greater in magnitude than their critical value, we can say there is statistical significance at the {{Math|0.05}} alpha level.]]
The [[F-test|''F''-test]] is used for comparing the factors of the total deviation. For example, in one-way, or single-factor ANOVA, statistical significance is tested for by comparing the F test statistic

<math display="block">F = \frac{\text{variance between treatments}}{\text{variance within treatments}}</math>
<math display="block">F = \frac{MS_\text{Treatments}}{MS_\text{Error}} = {{SS_\text{Treatments} / (I-1)} \over {SS_\text{Error} / (n_T-I)}}</math>

where ''MS'' is mean square, <math>I</math> is the number of treatments and <math>n_T</math> is the total number of cases

to the [[F-distribution|''F''-distribution]] with <math>I - 1</math> being the numerator degrees of freedom and <math>n_T - I</math> the denominator degrees of freedom. Using the ''F''-distribution is a natural candidate because the test statistic is the ratio of two scaled sums of squares each of which follows a scaled [[chi-squared distribution]].

The expected value of F is <math>1 + {n \sigma^2_\text{Treatment}} / {\sigma^2_\text{Error}}</math> (where <math>n</math> is the treatment sample size) which is 1 for no treatment effect. As values of F increase above 1, the evidence is increasingly inconsistent with the null hypothesis. Two apparent experimental methods of increasing F are increasing the sample size and reducing the error variance by tight experimental controls.

There are two methods of concluding the ANOVA hypothesis test, both of which produce the same result:
* The textbook method is to compare the observed value of F with the critical value of F determined from tables. The critical value of F is a function of the degrees of freedom of the numerator and the denominator and the significance level (''α''). If F ≥ F<sub>Critical</sub>, the null hypothesis is rejected.
* The computer method calculates the probability (p-value) of a value of F greater than or equal to the observed value. The null hypothesis is rejected if this probability is less than or equal to the significance level (''α'').
The ANOVA ''F''-test is known to be nearly optimal in the sense of minimizing false negative errors for a fixed rate of false positive errors (i.e. maximizing power for a fixed significance level). For example, to test the hypothesis that various medical treatments have exactly the same effect, the [[F-test|''F''-test]]'s ''p''-values closely approximate the [[permutation test]]'s [[p-value]]s: The approximation is particularly close when the design is balanced.<ref name="HinkelmannKempthorne" /><ref>Hinkelmann and Kempthorne (2008, Volume 1, Section 6.7: Completely randomized design; CRD with unequal numbers of replications)</ref> Such [[permutation test]]s characterize [[uniformly most powerful test|tests with maximum power]] against all [[alternative hypothesis|alternative hypotheses]], as observed by [[Paul R. Rosenbaum|Rosenbaum]].<ref group="nb">Rosenbaum (2002, page 40) cites Section 5.7 (Permutation Tests), Theorem 2.3 (actually Theorem 3, page 184) of [[Erich Leo Lehmann|Lehmann]]'s ''Testing Statistical Hypotheses'' (1959).</ref> The ANOVA ''F''-test (of the null-hypothesis that all treatments have exactly the same effect) is recommended as a practical test, because of its robustness against many alternative distributions.<ref>Moore and McCabe (2003, page 763)</ref><ref group="nb">The ''F''-test for the comparison of variances has a mixed reputation. It 
is not recommended as a hypothesis test to determine whether two ''different'' samples have the same variance. It is recommended for ANOVA where two estimates of the variance of the ''same'' sample are compared. While the ''F''-test is not generally robust against departures from normality, it has been found to be robust in the special case of ANOVA. Citations from Moore & McCabe (2003): "Analysis of variance uses F statistics, but these are not the same as the F statistic for comparing two population standard deviations." (page 554) "The F test and other procedures for inference about variances are so lacking in robustness as to be of little use in practice." (page 556) "[The ANOVA ''F''-test] is relatively insensitive to moderate nonnormality and unequal variances, especially when the sample sizes are similar." (page 763) ANOVA assumes homoscedasticity, but it is robust. The statistical test for homoscedasticity (the ''F''-test) is not robust. Moore & McCabe recommend a rule of thumb.</ref>

===Extended algorithm===
ANOVA consists of separable parts; partitioning sources of variance and hypothesis testing can be used individually. ANOVA is used to support other statistical tools. Regression is first used to fit more complex models to data, then ANOVA is used to compare models with the objective of selecting simple(r) models that adequately describe the data. "Such models could be fit without any reference to ANOVA, but ANOVA tools could then be used to make some sense of the fitted models, and to test hypotheses about batches of coefficients."<ref name="Gelman">Gelman (2008)</ref> "[W]e think of the analysis of variance as a way of understanding and structuring multilevel models—not as an alternative to regression but as a tool for summarizing complex high-dimensional inferences&nbsp;..."<ref name="Gelman" />