Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Least squares
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Regularization== {{Main|Regularized least squares}} ===Tikhonov regularization=== {{Main|Tikhonov regularization}} In some contexts, a [[Regularization (machine learning)|regularized]] version of the least squares solution may be preferable. [[Tikhonov regularization]] (or [[ridge regression]]) adds a constraint that <math>\left\|\beta\right\|_2^2</math>, the squared [[L2-norm|<math>\ell_2</math>-norm]] of the parameter vector, is not greater than a given value to the least squares formulation, leading to a constrained minimization problem. This is equivalent to the unconstrained minimization problem where the objective function is the residual sum of squares plus a penalty term <math>\alpha \left\|\beta\right\|_2^2</math> and <math>\alpha</math> is a tuning parameter (this is the [[Lagrange multipliers|Lagrangian]] form of the constrained minimization problem).<ref>{{cite arXiv |last=van Wieringen |first=Wessel N. |year=2021 |title=Lecture notes on ridge regression |class=stat.ME |eprint=1509.09169 }}</ref> In a [[Bayesian statistics|Bayesian]] context, this is equivalent to placing a zero-mean normally distributed [[prior distribution|prior]] on the parameter vector. ===Lasso method=== An alternative [[Regularization (machine learning)|regularized]] version of least squares is [[Lasso (statistics)|Lasso]] (least absolute shrinkage and selection operator), which uses the constraint that <math>\|\beta\|_1</math>, the [[L1-norm|L<sub>1</sub>-norm]] of the parameter vector, is no greater than a given value.<ref name=tibsh>{{cite journal |last=Tibshirani |first=R. |author-link = Rob Tibshirani | year=1996 |title=Regression shrinkage and selection via the lasso |journal=Journal of the Royal Statistical Society, Series B |volume=58|issue=1 |pages=267–288 |doi=10.1111/j.2517-6161.1996.tb02080.x |jstor=2346178}}</ref><ref name="ElementsStatLearn">{{cite book |url=http://www-stat.stanford.edu/~tibs/ElemStatLearn/ |title=The Elements of Statistical Learning |last1=Hastie |first1=Trevor |last2=Tibshirani |first2=Robert |last3=Friedman |first3=Jerome H. |author-link1=Trevor Hastie |author-link3=Jerome H. Friedman |edition=second |date=2009 |publisher=Springer-Verlag |isbn=978-0-387-84858-7 |url-status=dead |archive-url=https://web.archive.org/web/20091110212529/http://www-stat.stanford.edu/~tibs/ElemStatLearn/ |archive-date=2009-11-10 }}</ref><ref>{{cite book|last1=Bühlmann|first1=Peter|last2=van de Geer|first2=Sara|author2-link= Sara van de Geer |title=Statistics for High-Dimensional Data: Methods, Theory and Applications|date=2011|publisher=Springer|isbn=9783642201929}}</ref> (One can show like above using Lagrange multipliers that this is equivalent to an unconstrained minimization of the least-squares penalty with <math>\alpha\|\beta\|_1</math> added.) In a [[Bayesian statistics|Bayesian]] context, this is equivalent to placing a zero-mean [[Laplace distribution|Laplace]] [[prior distribution]] on the parameter vector.<ref>{{cite journal|last1=Park|first1=Trevor|last2=Casella|first2=George|author2-link=George Casella| title=The Bayesian Lasso|journal=Journal of the American Statistical Association|date=2008|volume=103|issue=482|pages=681–686|doi=10.1198/016214508000000337|s2cid=11797924}}</ref> The optimization problem may be solved using [[quadratic programming]] or more general [[convex optimization]] methods, as well as by specific algorithms such as the [[least angle regression]] algorithm. One of the prime differences between Lasso and ridge regression is that in ridge regression, as the penalty is increased, all parameters are reduced while still remaining non-zero, while in Lasso, increasing the penalty will cause more and more of the parameters to be driven to zero. This is an advantage of Lasso over ridge regression, as driving parameters to zero deselects the features from the regression. Thus, Lasso automatically selects more relevant features and discards the others, whereas Ridge regression never fully discards any features. Some [[feature selection]] techniques are developed based on the LASSO including Bolasso which bootstraps samples,<ref name=Bolasso>{{cite book|last1=Bach|first1=Francis R|title=Proceedings of the 25th international conference on Machine learning - ICML '08 |chapter=Bolasso |date=2008|pages=33–40|doi=10.1145/1390156.1390161|chapter-url=http://dl.acm.org/citation.cfm?id=1390161|isbn=9781605582054|bibcode=2008arXiv0804.1302B|arxiv=0804.1302|s2cid=609778}}</ref> and FeaLect which analyzes the regression coefficients corresponding to different values of <math>\alpha</math> to score all the features.<ref name=FeaLect>{{cite journal|last1=Zare|first1=Habil|title=Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis|journal=BMC Genomics|date=2013|volume=14|issue=Suppl 1 |pages=S14|doi=10.1186/1471-2164-14-S1-S14|pmid=23369194|pmc=3549810 |doi-access=free }}</ref> The ''L''<sup>1</sup>-regularized formulation is useful in some contexts due to its tendency to prefer solutions where more parameters are zero, which gives solutions that depend on fewer variables.<ref name=tibsh/> For this reason, the Lasso and its variants are fundamental to the field of [[compressed sensing]]. An extension of this approach is [[elastic net regularization]].
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Least squares
(section)
Add topic