Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Least squares
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Solving the least squares problem== The [[Maxima and minima|minimum]] of the sum of squares is found by setting the [[gradient]] to zero. Since the model contains ''m'' parameters, there are ''m'' gradient equations: <math display="block">\frac{\partial S}{\partial \beta_j}=2\sum_i r_i\frac{\partial r_i}{\partial \beta_j} = 0,\ j=1,\ldots,m,</math> and since <math>r_i=y_i-f(x_i,\boldsymbol \beta)</math>, the gradient equations become <math display="block">-2\sum_i r_i\frac{\partial f(x_i,\boldsymbol \beta)}{\partial \beta_j}=0,\ j=1,\ldots,m.</math> The gradient equations apply to all least squares problems. Each particular problem requires particular expressions for the model and its [[Partial derivative|partial derivatives]].<ref name=":1">{{Cite book|title=Quantifying measurement: the tyranny of numbers|last=Williams, Jeffrey H. (Jeffrey Huw), 1956-|others=Morgan & Claypool Publishers, Institute of Physics (Great Britain)|isbn=978-1-68174-433-9|location=San Rafael [California] (40 Oak Drive, San Rafael, CA, 94903, USA)|oclc=962422324|date = November 2016}}</ref> ===Linear least squares=== {{main|Linear least squares (mathematics)|l1=Linear least squares}} A regression model is a linear one when the model comprises a [[linear combination]] of the parameters, i.e., <math display="block"> f(x, \boldsymbol \beta) = \sum_{j = 1}^m \beta_j \phi_j(x),</math> where the function <math>\phi_j</math> is a function of <math> x </math>.<ref name=":1" /> Letting <math> X_{ij}= \phi_j(x_{i})</math> and putting the independent and dependent variables in matrices <math> X</math> and <math> Y,</math> respectively, we can compute the least squares in the following way. Note that <math> D</math> is the set of all data.<ref name=":1" /><ref name=":2">{{Cite book|last1=Rencher|first1=Alvin C.|url=https://books.google.com/books?id=0g-PAuKub3QC&pg=PA19|title=Methods of Multivariate Analysis|last2=Christensen|first2=William F.|date=2012-08-15|publisher=John Wiley & Sons|isbn=978-1-118-39167-9|pages=155|language=en}}</ref> <math display="block">L(D, \boldsymbol{\beta}) = \left\|Y - X\boldsymbol{\beta} \right\|^2 = (Y - X\boldsymbol{\beta})^\mathsf{T} (Y - X\boldsymbol{\beta})</math> <math display="block">= Y^\mathsf{T}Y- 2Y^\mathsf{T}X\boldsymbol{\beta} + \boldsymbol{\beta}^\mathsf{T}X^\mathsf{T}X\boldsymbol{\beta} </math> The gradient of the loss is: <math display="block">\frac{\partial L(D, \boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = \frac{\partial \left(Y^\mathsf{T}Y- 2Y^\mathsf{T}X\boldsymbol{\beta} + \boldsymbol{\beta}^\mathsf{T}X^\mathsf{T}X\boldsymbol{\beta}\right)}{\partial \boldsymbol{\beta}} = -2X^\mathsf{T}Y + 2X^\mathsf{T}X\boldsymbol{\beta}</math> Setting the gradient of the loss to zero and solving for <math>\boldsymbol{\beta}</math>, we get:<ref name=":2" /><ref name=":1" /> <math display="block">-2X^\mathsf{T}Y + 2X^\mathsf{T}X\boldsymbol{\beta} = 0 \Rightarrow X^\mathsf{T}Y = X^\mathsf{T}X\boldsymbol{\beta}</math> <math display="block">\boldsymbol{\hat{\beta}} = \left(X^\mathsf{T}X\right)^{-1} X^\mathsf{T}Y</math> ===Non-linear least squares=== {{main|Non-linear least squares}} There is, in some cases, a [[closed-form solution]] to a non-linear least squares problem – but in general there is not. In the case of no closed-form solution, numerical algorithms are used to find the value of the parameters <math>\beta</math> that minimizes the objective. Most algorithms involve choosing initial values for the parameters. Then, the parameters are refined iteratively, that is, the values are obtained by successive approximation: <math display="block">{\beta_j}^{k+1} = {\beta_j}^k+\Delta \beta_j,</math> where a superscript ''k'' is an iteration number, and the vector of increments <math>\Delta \beta_j</math> is called the shift vector. In some commonly used algorithms, at each iteration the model may be linearized by approximation to a first-order [[Taylor series]] expansion about <math> \boldsymbol \beta^k</math>: <math display="block">\begin{align} f(x_i,\boldsymbol \beta) &= f^k(x_i,\boldsymbol \beta) +\sum_j \frac{\partial f(x_i,\boldsymbol \beta)}{\partial \beta_j} \left(\beta_j-{\beta_j}^k \right) \\[1ex] &= f^k(x_i,\boldsymbol \beta) +\sum_j J_{ij} \,\Delta\beta_j. \end{align}</math> The [[Jacobian matrix and determinant|Jacobian]] '''J''' is a function of constants, the independent variable ''and'' the parameters, so it changes from one iteration to the next. The residuals are given by <math display="block">r_i = y_i - f^k(x_i, \boldsymbol \beta)- \sum_{k=1}^{m} J_{ik}\,\Delta\beta_k = \Delta y_i- \sum_{j=1}^m J_{ij}\,\Delta\beta_j.</math> To minimize the sum of squares of <math>r_i</math>, the gradient equation is set to zero and solved for <math> \Delta \beta_j</math>: <math display="block">-2\sum_{i=1}^n J_{ij} \left( \Delta y_i-\sum_{k=1}^m J_{ik} \, \Delta \beta_k \right) = 0,</math> which, on rearrangement, become ''m'' simultaneous linear equations, the '''normal equations''': <math display="block">\sum_{i=1}^{n}\sum_{k=1}^m J_{ij} J_{ik} \, \Delta \beta_k=\sum_{i=1}^n J_{ij} \, \Delta y_i \qquad (j=1,\ldots,m).</math> The normal equations are written in matrix notation as <math display="block">\left(\mathbf{J}^\mathsf{T} \mathbf{J}\right) \Delta \boldsymbol \beta = \mathbf{J}^\mathsf{T}\Delta \mathbf{y}.</math> <!-- or <math display="block">\mathbf{\left(J^TWJ\right) \, \Delta \boldsymbol \beta=J^TW \, \Delta y}</math> if weights are used. --> These are the defining equations of the [[Gauss–Newton algorithm]]. ===Differences between linear and nonlinear least squares=== * The model function, ''f'', in LLSQ (linear least squares) is a linear combination of parameters of the form <math>f = X_{i1}\beta_1 + X_{i2}\beta_2 +\cdots</math> The model may represent a straight line, a parabola or any other linear combination of functions. In NLLSQ (nonlinear least squares) the parameters appear as functions, such as <math>\beta^2, e^{\beta x}</math> and so forth. If the derivatives <math>\partial f / \partial \beta_j</math> are either constant or depend only on the values of the independent variable, the model is linear in the parameters. Otherwise, the model is nonlinear. *Need initial values for the parameters to find the solution to a NLLSQ problem; LLSQ does not require them. *Solution algorithms for NLLSQ often require that the Jacobian can be calculated similar to LLSQ. Analytical expressions for the partial derivatives can be complicated. If analytical expressions are impossible to obtain either the partial derivatives must be calculated by numerical approximation or an estimate must be made of the Jacobian, often via [[finite differences]]. *Non-convergence (failure of the algorithm to find a minimum) is a common phenomenon in NLLSQ. *LLSQ is globally concave so non-convergence is not an issue. *Solving NLLSQ is usually an iterative process which has to be terminated when a convergence criterion is satisfied. LLSQ solutions can be computed using direct methods, although problems with large numbers of parameters are typically solved with iterative methods, such as the [[Gauss–Seidel]] method. *In LLSQ the solution is unique, but in NLLSQ there may be multiple minima in the sum of squares. *Under the condition that the errors are uncorrelated with the predictor variables, LLSQ yields unbiased estimates, but even under that condition NLLSQ estimates are generally biased. These differences must be considered whenever the solution to a nonlinear least squares problem is being sought.<ref name=":1" />
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Least squares
(section)
Add topic