Editing Heritability (section)

==Estimating heritability==
Since only ''P'' can be observed or measured directly, heritability must be estimated from the similarities observed in subjects varying in their level of genetic or environmental similarity. The [[Statistics|statistical]] analyses required to estimate the [[genetics|genetic]] and [[Environment (biophysical)|environmental]] components of variance depend on the sample characteristics. Briefly, better estimates are obtained using data from individuals with widely varying levels of genetic relationship - such as [[twins]], siblings, parents and offspring, rather than from more distantly related (and therefore less similar) subjects. The [[standard error (statistics)|standard error]] for heritability estimates is improved with large sample sizes.

In non-human populations it is often possible to collect information in a controlled way. For example, among farm animals it is easy to arrange for a bull to produce offspring from a large number of cows and to control environments. Such [[experimental control]] is generally not possible when gathering human data, relying on naturally occurring relationships and environments.

In classical quantitative genetics, there were two schools of thought regarding estimation of heritability.

One [[School (discipline)|school of thought]] was developed by [[Sewall Wright]] at [[University of Chicago|The University of Chicago]], and further popularized by [[C. C. Li]] ([[University of Chicago]]) and [[Jay Laurence Lush|J. L. Lush]] ([[Iowa State University]]). It is based on the analysis of correlations and, by extension, regression. [[path analysis (statistics)|Path Analysis]] was developed by [[Sewall Wright]] as a way of estimating heritability.

The second was originally developed by [[Ronald Fisher|R. A. Fisher]] and expanded at [[University of Edinburgh|The University of Edinburgh]], [[Iowa State University]], and [[North Carolina State University]], as well as other schools. It is based on the [[analysis of variance]] of breeding studies, using the intraclass correlation of relatives. Various methods of estimating components of variance (and, hence, heritability) from [[Analysis of variance|ANOVA]] are used in these analyses.

Today, heritability can be estimated from general pedigrees using [[Best linear unbiased prediction|linear mixed models]] and from [[Genome-wide complex trait analysis|genomic relatedness]] estimated from genetic markers.

Studies of human heritability often utilize adoption study designs, often with [[Twin|identical twins]] who have been separated early in life and raised in different environments. Such individuals have identical genotypes and can be used to separate the effects of genotype and environment. A limit of this design is the common prenatal environment and the relatively low numbers of twins reared apart. A second and more common design is the [[twin study]] in which the similarity of identical and fraternal twins is used to estimate heritability. These studies can be limited by the fact that identical twins are [[Twin#Genetic and epigenetic similarity|not completely genetically identical]], potentially resulting in an underestimation of heritability.

In [[Observational study|observational studies]], or because of evocative effects (where a genome evokes environments by its effect on them), G and E may covary: [[gene environment correlation]]. Depending on the methods used to estimate heritability, correlations between genetic factors and shared or non-shared environments may or may not be confounded with heritability.<ref>{{cite journal | vauthors = Cattell RB | title = The multiple abstract variance analysis equations and solutions: for nature-nurture research on continuous variables | journal = Psychological Review | volume = 67 | issue = 6 | pages = 353–72 | date = November 1960 | pmid = 13691636 | doi = 10.1037/h0043487 }}</ref>

===Regression/correlation methods of estimation===
The first school of estimation uses regression and correlation to estimate heritability.

====Comparison of close relatives====
In the comparison of relatives, we find that in general,

:<math>h^2 = \frac{b}{r} = \frac{t}{r}</math>

where ''r'' can be thought of as the [[Coefficient of relationship|coefficient of relatedness]], ''b'' is the coefficient of regression and ''t'' is the coefficient of correlation.

=====Parent-offspring regression=====
[[Image:Galton experiment.png|200px|right|thumbnail|Figure 2. [[Francis Galton]]'s (1889) data showing the relationship between offspring height (928 individuals) as a function of mean parent height (205 sets of parents).]]

Heritability may be estimated by comparing parent and offspring traits (as in Fig. 2). The slope of the line (0.57) approximates the heritability of the trait when offspring values are regressed against the average trait in the parents. If only one parent's value is used then heritability is twice the slope. (This is the source of the term "[[regression analysis|regression]]," since the offspring values always tend to [[regression toward the mean|regress to the mean]] value for the population, ''i.e.'', the slope is always less than one). This regression effect also underlies the [[DeFries–Fulker regression|DeFries–Fulker method]] for analyzing twins selected for one member being affected.<ref>{{cite journal | vauthors = DeFries JC, Fulker DW | title = Multiple regression analysis of twin data | journal = Behavior Genetics | volume = 15 | issue = 5 | pages = 467–73 | date = September 1985 | pmid = 4074272 | doi = 10.1007/BF01066239 | s2cid = 1172312 }}</ref>

=====Sibling comparison=====
A basic approach to heritability can be taken using full-Sib designs: comparing similarity between siblings who share both a biological mother and a father.<ref>{{cite book | title = Introduction to Quantitative Genetics | first1 = Douglas S. | last1 = Falconer | first2 = Trudy F. C. | last2 = Mackay | name-list-style = vanc | date = December 1995 | publisher = [[Longman]] | edition = 4th | isbn = 978-0582243026 | url = https://archive.org/details/introductiontoqu00falc }}</ref> When there is only additive gene action, this sibling phenotypic correlation is an index of ''familiarity'' – the sum of half the additive genetic variance plus full effect of the common environment. It thus places an upper limit on additive heritability of twice the full-Sib phenotypic correlation. Half-Sib designs compare phenotypic traits of siblings that share one parent with other sibling groups.

=====Twin studies=====
{{main|Twin study}}

[[Image:Twin-concordances.jpg|300px|thumbnail|Figure 3. Twin concordances for seven psychological traits (sample size shown inside bars), with DZ being fraternal and MZ being identical twins.]]
Heritability for traits in humans is most frequently estimated by comparing resemblances between twins. "The advantage of twin studies, is that the total variance can be split up into genetic, shared or common environmental, and unique environmental components, enabling an accurate estimation of heritability".<ref name="pmid18157630">{{cite journal | vauthors = Gielen M, Lindsey PJ, Derom C, Smeets HJ, Souren NY, Paulussen AD, Derom R, Nijhuis JG | title = Modeling genetic and environmental factors to increase heritability and ease the identification of candidate genes for birth weight: a twin study | journal = Behavior Genetics | volume = 38 | issue = 1 | pages = 44–54 | date = January 2008 | pmid = 18157630 | pmc = 2226023 | doi = 10.1007/s10519-007-9170-3 }}</ref> Fraternal or dizygotic (DZ) twins on average share half their genes (assuming there is no [[assortative mating]] for the trait), and so identical or monozygotic (MZ) twins on average are twice as genetically similar as DZ twins. A crude estimate of heritability, then, is approximately twice the difference in [[correlation]] between MZ and DZ twins, i.e. [[Falconer's formula]] ''H''<sup>2</sup>=2(r(MZ)-r(DZ)).

The effect of shared environment, ''c''<sup>2</sup>, contributes to similarity between siblings due to the commonality of the environment they are raised in. Shared environment is approximated by the DZ correlation minus half heritability, which is the degree to which DZ twins share the same genes, ''c''<sup>2</sup>=DZ-1/2''h''<sup>2</sup>. Unique environmental variance, ''e''<sup>2</sup>, reflects the degree to which identical twins raised together are dissimilar, ''e''<sup>2</sup>=1-r(MZ).

===Analysis of variance methods of estimation===
The second set of methods of estimation of heritability involves ANOVA and estimation of variance components.

====Basic model====
We use the basic discussion of Kempthorne.<ref name = "Kempthorne_1957" /> Considering only the most basic of genetic models, we can look at the quantitative contribution of a single locus with genotype '''G<sub>i</sub>''' as

:<math>y_i = \mu + g_i + e</math>

where <math>g_i</math> is the effect of genotype '''G<sub>i</sub>''' and <math>e</math> is the environmental effect.

Consider an experiment with a group of sires and their progeny from random dams. Since the progeny get half of their genes from the father and half from their (random) mother, the progeny equation is

:<math>z_i = \mu + \frac{1}{2}g_i + e</math>

=====Intraclass correlations=====
Consider the experiment above. We have two groups of progeny we can compare. The first is comparing the various progeny for an individual sire (called ''within sire group''). The variance will include terms for genetic variance (since they did not all get the same genotype) and environmental variance. This is thought of as an ''error'' term.

The second group of progeny are comparisons of means of half sibs with each other (called ''among sire group''). In addition to the [[Errors and residuals in statistics|error term]] as in the within sire groups, we have an addition term due to the differences among different means of half sibs. The intraclass correlation is
:<math>\mathrm{corr}(z,z') = \mathrm{corr}(\mu + \frac{1}{2}g + e, \mu + \frac{1}{2}g + e') = \frac{1}{4}V_g</math> ,
since environmental effects are independent of each other.

=====The ANOVA=====
In an experiment with <math>n</math> sires and <math>r</math> progeny per sire, we can calculate the following ANOVA, using <math>V_g</math> as the genetic variance and <math>V_e</math> as the environmental variance:

{| class="wikitable"
|+ Table 1: ANOVA for Sire experiment
! Source
! d.f.
! Mean Square
! Expected Mean Square
|-
|Between sire groups
|<math>n-1</math>
|<math>S</math>
|<math>\frac{3}{4}V_g + V_e + r({\frac{1}{4}V_g})</math>
|-
|Within sire groups
|<math>n(r-1)</math>
|<math>W</math>
|<math>\frac{3}{4}V_g + V_e</math>
|}

The <math>\frac{1}{4}V_g</math> term is the [[intraclass correlation]] between half sibs. We can easily calculate <math>H^2 = \frac{V_g}{V_g+V_e} = \frac{4(S-W)}{S+(r-1)W}</math>. The expected mean square is calculated from the relationship of the individuals (progeny within a sire are all half-sibs, for example), and an understanding of intraclass correlations.

The use of ANOVA to calculate heritability often fails to account for the presence of [[Gene–environment interaction|gene–-environment interactions]], because ANOVA has a much lower [[statistical power]] for testing for interaction effects than for direct effects.<ref>{{Cite journal |last=Wahlsten |first=Douglas |date=March 1990 |title=Insensitivity of the analysis of variance to heredity-environment interaction |journal=Behavioral and Brain Sciences |language=en |volume=13 |issue=1 |pages=109–120 |doi=10.1017/S0140525X00077797 |s2cid=143217984 |issn=1469-1825 |url=http://libres.uncg.edu/ir/uncg/f/D_Wahlsten_Insensitivity_1990.pdf |access-date=2020-09-06 |archive-date=2020-10-05 |archive-url=https://web.archive.org/web/20201005031444/http://libres.uncg.edu/ir/uncg/f/D_Wahlsten_Insensitivity_1990.pdf |url-status=live }}</ref>

====Model with additive and dominance terms====

For a model with additive and dominance terms, but not others, the equation for a single locus is

:<math>y_{ij} = \mu + \alpha_i + \alpha_j + d_{ij} + e, </math>

where

<math>\alpha_i</math> is the additive effect of the i<sup>th</sup> allele, <math>\alpha_j</math> is the additive effect of the j<sup>th</sup> allele, <math>d_{ij}</math> is the dominance deviation for the ij<sup>th</sup> genotype, and <math>e</math> is the environment.

Experiments can be run with a similar setup to the one given in Table 1. Using different relationship groups, we can evaluate different intraclass correlations. Using <math>V_a</math> as the additive genetic variance and <math>V_d</math> as the dominance deviation variance, intraclass correlations become [[linear function]]s of these parameters. In general,

:Intraclass correlation<math> = r V_a + \theta V_d,</math>

where <math>r</math> and <math>\theta</math> are found as

<math>r = </math>P[ [[alleles]] drawn at random from the relationship pair are [[identity by descent|identical by descent]]], and

<math>\theta = </math>P[ [[genotypes]] drawn at random from the relationship pair are [[identity by descent|identical by descent]]].

Some common relationships and their coefficients are given in Table 2.

{| class="wikitable"
|+ Table 2: Coefficients for calculating variance components
! Relationship
! <math>r</math>
! <math>\theta</math>
|-
|Identical Twins
|<math>1</math>
|<math>1</math>
|-
|Parent-Offspring
|<math>\frac{1}{2}</math>
|<math>0</math>
|-
|Half Siblings
|<math>\frac{1}{4}</math>
|<math>0</math>
|-
|Full Siblings
|<math>\frac{1}{2}</math>
|<math>\frac{1}{4}</math>
|-
|First Cousins
|<math>\frac{1}{8}</math>
|<math>0</math>
|-
|Double First Cousins
|<math>\frac{1}{4}</math>
|<math>\frac{1}{16}</math>
|}

===Linear mixed models===
A wide variety of approaches using linear mixed models have been reported in literature. Via these methods, phenotypic variance is partitioned into genetic, environmental and experimental design variances to estimate heritability. Environmental variance can be explicitly modeled by studying individuals across a broad range of environments, although inference of genetic variance from phenotypic and environmental variance may lead to underestimation of heritability due to the challenge of capturing the full range of environmental influence affecting a trait. Other methods for calculating heritability use data from [[genome-wide association studies]] to estimate the influence on a trait by genetic factors, which is reflected by the rate and influence of putatively associated genetic loci (usually [[single-nucleotide polymorphisms]]) on the trait. This can lead to underestimation of heritability, however. This discrepancy is referred to as "missing heritability" and reflects the challenge of accurately modeling both genetic and environmental variance in heritability models.<ref name="pmid27382152">{{cite journal | vauthors = Heckerman D, Gurdasani D, Kadie C, Pomilla C, Carstensen T, Martin H, Ekoru K, Nsubuga RN, Ssenyomo G, Kamali A, Kaleebu P, Widmer C, Sandhu MS | title = Linear mixed model for heritability estimation that explicitly addresses environmental variation | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 113 | issue = 27 | pages = 7377–82 | date = July 2016 | pmid = 27382152 | pmc = 4941438 | doi = 10.1073/pnas.1510497113 | bibcode = 2016PNAS..113.7377H | doi-access = free }}</ref>

When a large, complex pedigree or another aforementioned type of data is available, heritability and other quantitative genetic parameters can be estimated by [[restricted maximum likelihood]] (REML) or [[Bayesian statistics|Bayesian methods]]. The [[raw data]] will usually have three or more data points for each individual: a code for the sire, a code for the dam and one or several trait values. Different trait values may be for different traits or for different time points of measurement.

The currently popular methodology relies on high degrees of certainty over the identities of the sire and dam; it is not common to treat the sire identity probabilistically. This is not usually a problem, since the methodology is rarely applied to wild populations (although it has been used for several wild ungulate and bird populations), and sires are invariably known with a very high degree of certainty in breeding programmes. There are also algorithms that account for uncertain paternity.

The pedigrees can be viewed using programs such as Pedigree Viewer [http://www-personal.une.edu.au/~bkinghor/pedigree.htm], and analyzed with programs such as [[ASReml]], VCE [https://web.archive.org/web/20070312053650/http://vce.tzv.fal.de/index.pl], WOMBAT [https://web.archive.org/web/20061128123632/http://agbu.une.edu.au/~kmeyer/wombat.html], MCMCglmm within the R environment [https://www.rdocumentation.org/packages/MCMCglmm/versions/2.29/topics/MCMCglmm-package] or the [[BLUPF90]] family of programs [http://nce.ads.uga.edu/~ignacy/programs.html].

Pedigree models are helpful for untangling confounds such as [[reverse causality]], [[maternal effects]] such as the [[prenatal environment]], and confounding of [[genetic dominance]], shared environment, and maternal gene effects.<ref>{{cite journal | vauthors = Hill WG, Goddard ME, Visscher PM | title = Data and theory point to mainly additive genetic variance for complex traits | journal = PLOS Genetics | volume = 4 | issue = 2 | pages = e1000008 | date = February 2008 | pmid = 18454194 | pmc = 2265475 | doi = 10.1371/journal.pgen.1000008 | veditors = MacKay TF, Goddard ME | doi-access = free }} {{open access}}</ref><ref name=VisscherHillWray>{{cite journal | vauthors = Visscher PM, Hill WG, Wray NR | title = Heritability in the genomics era--concepts and misconceptions | journal = Nature Reviews. Genetics | volume = 9 | issue = 4 | pages = 255–66 | date = April 2008 | pmid = 18319743 | doi = 10.1038/nrg2322 | s2cid = 690431 | url = http://genepi.qimr.edu.au/contents/p/staff/visscher_hill_wray_nrg2.pdf | author-link2 = William G. Hill | access-date = 2015-08-28 | archive-date = 2016-03-24 | archive-url = https://web.archive.org/web/20160324154308/https://genepi.qimr.edu.au/contents/p/staff/visscher_hill_wray_nrg2.pdf | url-status = live }}</ref>

===Genomic heritability===

When genome-wide genotype data and phenotypes from large population samples are available, one can estimate the relationships between individuals based on their genotypes and use a linear mixed model to estimate the variance explained by the genetic markers. This gives a genomic heritability estimate based on the variance captured by common genetic variants.<ref name=":0">{{cite journal | vauthors = Yang J, Zeng J, Goddard ME, Wray NR, Visscher PM | title = Concepts, estimation and interpretation of SNP-based heritability | journal = Nature Genetics | volume = 49 | issue = 9 | pages = 1304–1310 | date = August 2017 | pmid = 28854176 | doi = 10.1038/ng.3941 | s2cid = 8790524 | url = https://espace.library.uq.edu.au/view/UQ:685190/UQ685190_OA.pdf | access-date = 2020-09-06 | archive-date = 2020-10-05 | archive-url = https://web.archive.org/web/20201005031447/https://espace.library.uq.edu.au/data/UQ_685190/UQ685190_OA.pdf?Expires=1601867773&Key-Pair-Id=APKAJKNBJ4MJBJNC6NLQ&Signature=bW3fUGrxlgOvhlH0eESG9ivvBsCaH1jHWK7Qkg55xINU3~0QKDLn9KDU13--6EX1u6-xHsgowMB-V0MH7CYpUPkh1FRd7Bm9dIM7F-oZMNCp8cV10k6X19v4QQ-l6EmUFGVyUcyE8DT8ewHq6XK4T4a0Oxkn8A-HsaUstg2dFif7IFBsAhp3-dNH0xM6WDhdbx5a73zCn352VWGvT7wwU~lo6mj4N~N7SMYNGeuzuhH1hUV8IJm2fXRfiCAF1zZSM4fY32OANJ9Ro6IJN9qNxOEKvUvQRVTBe4BBy4No4WlAe6UBQgYLHm7KR4YxHF3sfYRmkn96cbaEWUFJ5YGIiQ__ | url-status = live }}</ref> There are multiple methods that make different adjustments for allele frequency and [[linkage disequilibrium]]. Particularly, the method called High-Definition Likelihood (HDL) can estimate genomic heritability using only GWAS summary statistics,<ref name=":1">{{cite journal | vauthors = Ning Z, Pawitan Y, Shen X | title = High-definition likelihood inference of genetic correlations across human complex traits | journal = Nature Genetics | volume = 52 | issue = 8 | pages = 859–864 | date = June 2020 | pmid = 32601477 | doi = 10.1038/s41588-020-0653-y | hdl = 10616/47311 | s2cid = 220260262 | url = https://www.pure.ed.ac.uk/ws/files/144684519/High_definition_likelihood_inference_of_genetic_correlations_across_human_complex_traits.pdf | hdl-access = free | access-date = 2021-02-08 | archive-date = 2021-04-15 | archive-url = https://web.archive.org/web/20210415093348/https://www.pure.ed.ac.uk/ws/files/144684519/High_definition_likelihood_inference_of_genetic_correlations_across_human_complex_traits.pdf | url-status = live }}</ref> making it easier to incorporate large sample size available in various GWAS meta-analysis.