Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Outlier
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Modified Thompson Tau test=== {{see also|Studentized residual#Distribution}} The modified Thompson Tau test is a method used to determine if an outlier exists in a data set.<ref>{{Cite web |last=Wheeler |first=Donald J. |date=11 January 2021 |title=Some Outlier Tests: Part 2 |url=https://www.qualitydigest.com/inside/statistics-column/some-outlier-tests-part-2-011121.html |access-date=2025-02-09 |website=Quality Digest |language=en}}</ref> The strength of this method lies in the fact that it takes into account a data set's standard deviation, average and provides a statistically determined rejection zone; thus providing an objective method to determine if a data point is an outlier.{{Citation needed|reason=Although intuitively appealing, this method appears to be unpublished (it is ''not'' described in Thompson (1985) so one should use it with caution.|date=October 2016}}<ref>Thompson .R. (1985). "[https://www.jstor.org/stable/2345543?seq=1#page_scan_tab_contents A Note on Restricted Maximum Likelihood Estimation with an Alternative Outlier Model]".Journal of the Royal Statistical Society. Series B (Methodological), Vol. 47, No. 1, pp. 53-55</ref> How it works: First, a data set's average is determined. Next the absolute deviation between each data point and the average are determined. Thirdly, a rejection region is determined using the formula: :<math>\text{Rejection Region}{{=}} \frac{{t_{\alpha/2}}{\left ( n-1 \right )}}{\sqrt{n}\sqrt{n-2+{t_{\alpha/2}^2}}} </math>; where <math>\scriptstyle{t_{\alpha/2}}</math> is the critical value from the Student {{mvar|t}} distribution with ''n''-2 degrees of freedom, ''n'' is the sample size, and s is the sample standard deviation. To determine if a value is an outlier: Calculate <math>\scriptstyle \delta = |(X - mean(X)) / s|</math>. If ''Ξ΄'' > Rejection Region, the data point is an outlier. If ''Ξ΄'' β€ Rejection Region, the data point is not an outlier. The modified Thompson Tau test is used to find one outlier at a time (largest value of ''Ξ΄'' is removed if it is an outlier). Meaning, if a data point is found to be an outlier, it is removed from the data set and the test is applied again with a new average and rejection region. This process is continued until no outliers remain in a data set. Some work has also examined outliers for nominal (or categorical) data. In the context of a set of examples (or instances) in a data set, instance hardness measures the probability that an instance will be misclassified ( <math>1-p(y|x)</math> where {{mvar|y}} is the assigned class label and {{mvar|x}} represent the input attribute value for an instance in the training set {{mvar|t}}).<ref>Smith, M.R.; Martinez, T.; Giraud-Carrier, C. (2014). "[https://link.springer.com/article/10.1007%2Fs10994-013-5422-z An Instance Level Analysis of Data Complexity]". Machine Learning, 95(2): 225-256.</ref> Ideally, instance hardness would be calculated by summing over the set of all possible hypotheses {{mvar|H}}: :<math>\begin{align}IH(\langle x, y\rangle) &= \sum_H (1 - p(y, x, h))p(h|t)\\ &= \sum_H p(h|t) - p(y, x, h)p(h|t)\\ &= 1- \sum_H p(y, x, h)p(h|t).\end{align}</math> Practically, this formulation is unfeasible as {{mvar|H}} is potentially infinite and calculating <math>p(h|t)</math> is unknown for many algorithms. Thus, instance hardness can be approximated using a diverse subset <math>L \subset H</math>: :<math>IH_L (\langle x,y\rangle) = 1 - \frac{1}{|L|} \sum_{j=1}^{|L|} p(y|x, g_j(t, \alpha))</math> where <math>g_j(t, \alpha)</math> is the hypothesis induced by learning algorithm <math>g_j</math> trained on training set {{mvar|t}} with hyperparameters <math>\alpha</math>. Instance hardness provides a continuous value for determining if an instance is an outlier instance.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Outlier
(section)
Add topic