Editing Kalman filter (section)

== Details ==
The Kalman filter is a [[infinite impulse response|recursive]] estimator. This means that only the estimated state from the previous time step and the current measurement are needed to compute the estimate for the current state. In contrast to batch estimation techniques, no history of observations and/or estimates is required. In what follows, the notation <math>\hat{\mathbf{x}}_{n\mid m}</math> represents the estimate of <math>\mathbf{x}</math> at time ''n'' given observations up to and including at time {{nowrap|''m'' ≤ ''n''}}.

The state of the filter is represented by two variables:
* <math>\hat\mathbf{x}_{k\mid k}</math>, the ''[[a posteriori]]'' state estimate mean at time ''k'' given observations up to and including at time ''k'';
* <math>\mathbf{P}_{k\mid k}</math>, the ''a posteriori'' estimate covariance matrix (a measure of the estimated [[accuracy and precision|accuracy]] of the state estimate).

The algorithm structure of the Kalman filter resembles that of [[Alpha beta filter]]. The Kalman filter can be written as a single equation; however, it is most often conceptualized as two distinct phases: "Predict" and "Update". The predict phase uses the state estimate from the previous timestep to produce an estimate of the state at the current timestep. This predicted state estimate is also known as the ''a priori'' state estimate because, although it is an estimate of the state at the current timestep, it does not include observation information from the current timestep. In the update phase, the [[Innovation (signal processing)|innovation]] (the pre-fit residual), i.e. the difference between the current ''a priori'' prediction and the current observation information, is multiplied by the optimal Kalman gain and combined with the previous state estimate to refine the state estimate. This improved estimate based on the current observation is termed the ''a posteriori'' state estimate.

Typically, the two phases alternate, with the prediction advancing the state until the next scheduled observation, and the update incorporating the observation.  However, this is not necessary; if an observation is unavailable for some reason, the update may be skipped and multiple prediction procedures performed.  Likewise, if multiple independent observations are available at the same time, multiple update procedures may be performed (typically with different observation matrices '''H'''<sub>''k''</sub>).<ref>{{cite journal|last1=Kelly|first1=Alonzo|title=A 3D state space formulation of a navigation Kalman filter for autonomous vehicles|journal=DTIC Document|date=1994|page=13|url=http://apps.dtic.mil/dtic/tr/fulltext/u2/a282853.pdf|archive-url=https://web.archive.org/web/20141230004557/http://www.dtic.mil/dtic/tr/fulltext/u2/a282853.pdf|url-status=live|archive-date=December 30, 2014}} [http://www.frc.ri.cmu.edu/~alonzo/pubs/reports/kalman_V2.pdf 2006 Corrected Version] {{Webarchive|url=https://web.archive.org/web/20170110204109/http://www.frc.ri.cmu.edu/~alonzo/pubs/reports/kalman_V2.pdf |date=2017-01-10 }}</ref><ref>{{cite web|last1=Reid|first1=Ian|last2=Term|first2=Hilary|title=Estimation II|url=http://www.robots.ox.ac.uk/~ian/Teaching/Estimation/LectureNotes2.pdf|website=www.robots.ox.ac.uk|publisher=Oxford University|access-date=6 August 2014}}</ref>

=== Predict ===
{|
|-
| style="width:180pt;" | Predicted (''a priori'') state estimate
| <math>\hat{\mathbf{x}}_{k\mid k-1} = \mathbf{F}_k \hat\mathbf{x}_{k-1\mid k-1} + \mathbf{B}_k \mathbf{u}_{k}</math>
|-
| Predicted (''a priori'') estimate covariance
| <math>{\mathbf{P}}_{k\mid k-1} = \mathbf{F}_k \mathbf{P}_{k-1 \mid k-1} \mathbf{F}_k^\textsf{T} + \mathbf{Q}_{k}</math>
|}

=== Update ===
{|
|-
| style="width:180pt;" | [[Innovation (signal processing)|Innovation]] or measurement pre-fit residual
| <math>\tilde{\mathbf{y}}_k = \mathbf{z}_k - \mathbf{H}_k\hat{\mathbf{x}}_{k\mid k-1}</math>
|-
| Innovation (or pre-fit residual) covariance
| <math>\mathbf{S}_k = \mathbf{H}_k {\mathbf{P}}_{k\mid k-1} \mathbf{H}_k^\textsf{T} + \mathbf{R}_k</math>
|-
| ''Optimal'' Kalman gain
| <math>\mathbf{K}_k = {\mathbf{P}}_{k\mid k-1}\mathbf{H}_k^\textsf{T} \mathbf{S}_k^{-1}</math>
|-
| Updated (''a posteriori'') state estimate
| <math>\hat{\mathbf{x}}_{k\mid k} = \hat{\mathbf{x}}_{k\mid k-1} + \mathbf{K}_k\tilde{\mathbf{y}}_k</math>
|-
| Updated (''a posteriori'') estimate covariance
| <math>\mathbf{P}_{k|k} = \left(\mathbf{I} - \mathbf{K}_k \mathbf{H}_k\right) {\mathbf{P}}_{k|k-1} </math>
|-
| Measurement post-fit [[residuals (statistics)|residual]]
| <math>\tilde{\mathbf{y}}_{k\mid k} = \mathbf{z}_k - \mathbf{H}_k\hat{\mathbf{x}}_{k\mid k}</math>
|}

The formula for the updated (''a posteriori'') estimate covariance above is valid for the optimal '''K'''<sub>k</sub> gain that minimizes the residual error, in which form it is most widely used in applications. Proof of the formulae is found in the ''[[#Derivations|derivations]]'' section, where the formula valid for any '''K'''<sub>k</sub> is also shown.

A more intuitive way to express the updated state estimate (<math>\hat{\mathbf{x}}_{k\mid k}</math>) is:

:<math>\hat{\mathbf{x}}_{k\mid k} = (\mathbf{I} - \mathbf{K}_k \mathbf{H}_k) \hat{\mathbf{x}}_{k\mid k-1} + \mathbf{K}_k \mathbf{z}_k</math>

This expression reminds us of a linear interpolation, <math>x = (1-t)(a) + t(b)</math> for <math>t</math> between [0,1]. 
In our case:
* <math>t</math> is the matrix <math>\mathbf{K}_k \mathbf{H}_k</math> that takes values from <math>0</math> (high error in the sensor) to <math>I</math> or a projection (low error).
* <math>a</math> is the internal state <math>\hat{\mathbf{x}}_{k\mid k-1}</math> estimated from the model.
* <math>b</math> is the internal state <math>\mathbf{K}_k \mathbf{z}_k</math> estimated from the measurement, assuming <math>\mathbf{K}_k</math> is nonsingular.
This expression also resembles the [[alpha beta filter]] update step.

=== Invariants ===
If the model is accurate, and the values for <math>\hat{\mathbf{x}}_{0\mid 0}</math> and <math>\mathbf{P}_{0\mid 0}</math> accurately reflect the distribution of the initial state values, then the following invariants are preserved:
:<math>\begin{align}
  \operatorname{E}[\mathbf{x}_k - \hat{\mathbf{x}}_{k\mid k}] &= \operatorname{E}[\mathbf{x}_k - \hat{\mathbf{x}}_{k\mid k-1}] = 0 \\
                       \operatorname{E}[\tilde{\mathbf{y}}_k] &= 0
\end{align}</math>

where <math>\operatorname{E}[\xi]</math> is the [[expected value]] of <math>\xi</math>. That is, all estimates have a mean error of zero.

Also:
:<math>\begin{align}
    \mathbf{P}_{k\mid k} &= \operatorname{cov}\left(\mathbf{x}_k - \hat{\mathbf{x}}_{k\mid k}\right) \\
  \mathbf{P}_{k\mid k-1} &= \operatorname{cov}\left(\mathbf{x}_k - \hat{\mathbf{x}}_{k\mid k-1}\right) \\
            \mathbf{S}_k &= \operatorname{cov}\left(\tilde{\mathbf{y}}_k\right)
\end{align}</math>

so covariance matrices accurately reflect the covariance of estimates.

=== Estimation of the noise covariances Q<sub>''k''</sub> and R<sub>''k''</sub> ===
Practical implementation of a Kalman Filter is often difficult due to the difficulty of getting a good estimate of the noise covariance matrices '''Q'''<sub>''k''</sub> and '''R'''<sub>''k''</sub>.  Extensive research has been done to estimate these covariances from data.  One practical method of doing this is the ''autocovariance least-squares (ALS)'' technique that uses the time-lagged [[autocovariance]]s of routine operating data to estimate the covariances.<ref>{{cite thesis |url=http://jbrwww.che.wisc.edu/theses/rajamani.pdf |last=Rajamani |first=Murali |type=PhD Thesis |title=Data-based Techniques to Improve State Estimation in Model Predictive Control |location=University of Wisconsin–Madison |date=October 2007 |access-date=2011-04-04 |archive-url=https://web.archive.org/web/20160304194938/http://jbrwww.che.wisc.edu/theses/rajamani.pdf |archive-date=2016-03-04  }}</ref><ref>{{cite journal |last1=Rajamani |first1=Murali R. |last2=Rawlings |first2=James B. |title=Estimation of the disturbance structure from data using semidefinite programming and optimal weighting |journal=Automatica |volume=45 |issue=1 |pages=142–148 |year=2009 |doi=10.1016/j.automatica.2008.05.032 |s2cid=5699674 }}</ref>  The [[GNU Octave]] and [[Matlab]] code used to calculate the noise covariance matrices using the ALS technique is available online using the [[GNU General Public License]].<ref>{{cite web |url=https://sites.engineering.ucsb.edu/~jbraw/software/als/index.html |title=Autocovariance Least-Squares Toolbox |publisher=Jbrwww.che.wisc.edu |access-date=2021-08-18 }}</ref> Field Kalman Filter (FKF), a Bayesian algorithm, which allows simultaneous estimation of the state, parameters and noise covariance has been proposed.<ref>{{cite conference |url= https://www.researchgate.net/publication/312029167|title= Field Kalman Filter and its approximation|last1= Bania|first1= P.|last2= Baranowski|first2=J.|publisher=IEEE |date=12 December 2016|pages= 2875–2880|location= Las Vegas, NV, USA|conference= IEEE 55th Conference on Decision and Control (CDC)}}</ref> The FKF algorithm has a recursive formulation, good observed convergence, and relatively low complexity, thus suggesting that the FKF algorithm may possibly be a worthwhile alternative to the Autocovariance Least-Squares methods. Another approach is the ''Optimized Kalman Filter'' (''OKF''), which considers the covariance matrices not as representatives of the noise, but rather, as parameters aimed to achieve the most accurate state estimation.<ref name=":0">{{Cite journal |last1=Greenberg |first1=Ido |last2=Yannay |first2=Netanel |last3=Mannor |first3=Shie |date=2023-12-15 |title=Optimization or Architecture: How to Hack Kalman Filtering |url=https://proceedings.neurips.cc/paper_files/paper/2023/hash/9dfcc83c01e94d02c751c47517855c9f-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=36 |pages=50482–50505|arxiv=2310.00675 }}</ref> These two views coincide under the KF assumptions, but often contradict each other in real systems. Thus, OKF's state estimation is more robust to modeling inaccuracies.

=== Optimality and performance ===
The Kalman filter provides an optimal state estimation in cases where a) the model matches the real system perfectly, b) the entering noise is "white" (uncorrelated), and c) the covariances of the noise are known exactly. Correlated noise can also be treated using Kalman filters.<ref>{{Cite book|last1=Bar-Shalom|first1=Yaakov|title=Estimation with Applications to Tracking and Navigation|last2=Li|first2=X.-Rong|last3=Kirubarajan|first3=Thiagalingam|date=2001|publisher=John Wiley & Sons, Inc.|isbn=0-471-41655-X|location=New York, USA|pages=319 ff|doi=10.1002/0471221279}}</ref> 
Several methods for the noise covariance estimation have been proposed during past decades, including ALS, mentioned in the section above. More generally, if the model assumptions do not match the real system perfectly, then optimal state estimation is not necessarily obtained by setting '''Q'''<sub>''k''</sub> and '''R'''<sub>''k''</sub> to the covariances of the noise. Instead, in that case, the parameters '''Q'''<sub>''k''</sub> and '''R'''<sub>''k''</sub> may be set to explicitly optimize the state estimation,<ref name=":0" /> e.g., using standard [[supervised learning]].

After the covariances are set, it is useful to evaluate the performance of the filter; i.e., whether it is possible to improve the state estimation quality. If the Kalman filter works optimally, the innovation sequence (the output prediction error) is a white noise, therefore the whiteness property of the [[Innovation (signal processing)|innovations]] measures filter performance. Several different methods can be used for this purpose.<ref>Three optimality tests with numerical examples are described in  {{cite book|doi=10.3182/20120711-3-BE-2027.00011|chapter=Optimality Tests and Adaptive Kalman Filter|title=16th IFAC Symposium on System Identification|series=IFAC Proceedings Volumes|volume=45|issue=16|pages=1523–1528|year=2012|last1=Peter|first1=Matisko|isbn=978-3-902823-06-9}}</ref> If the noise terms are distributed in a non-Gaussian manner, methods for assessing performance of the filter estimate, which use probability inequalities or large-sample theory, are known in the literature.<ref>{{cite journal|doi=10.1016/0005-1098(95)00069-9|title=The Kantorovich inequality for error analysis of the Kalman filter with unknown noise distributions|journal=Automatica|volume=31|issue=10|pages=1513–1517|year=1995|last1=Spall|first1=James C.}}</ref><ref>{{cite journal|doi=10.1109/TAC.2003.821415|title=Use of the Kalman Filter for Inference in State-Space Models with Unknown Noise Distributions|journal=IEEE Transactions on Automatic Control|volume=49|pages=87–90|year=2004|last1=Maryak|first1=J.L.|last2=Spall|first2=J.C.|last3=Heydon|first3=B.D.|s2cid=21143516}}</ref>