Editing Kalman filter (section)

== Nonlinear filters ==
The basic Kalman filter is limited to a linear assumption. More complex systems, however, can be [[nonlinear filter|nonlinear]]. The nonlinearity can be associated either with the process model or with the observation model or with both.

The most common variants of Kalman filters for non-linear systems are the Extended Kalman Filter and Unscented Kalman filter. The suitability of which filter to use depends on the non-linearity indices of the process and observation model.<ref>{{Cite journal|last1=Biswas|first1=Sanat K.|last2=Qiao|first2=Li|last3=Dempster|first3=Andrew G.|date=2020-12-01|title=A quantified approach of predicting suitability of using the Unscented Kalman Filter in a non-linear application|url=http://www.sciencedirect.com/science/article/pii/S0005109820304398|journal=Automatica|language=en|volume=122|page=109241|doi=10.1016/j.automatica.2020.109241|s2cid=225028760|issn=0005-1098}}</ref>

=== Extended Kalman filter ===
{{Main|Extended Kalman filter}}

In the extended Kalman filter (EKF), the state transition and observation models need not be linear functions of the state but may instead be nonlinear functions. These functions are of [[Differentiable function|differentiable]] type.
:<math>\begin{align}
  \mathbf{x}_k &= f(\mathbf{x}_{k-1}, \mathbf{u}_k) + \mathbf{w}_k \\
  \mathbf{z}_k &= h(\mathbf{x}_k) + \mathbf{v}_k
\end{align}</math>

The function ''f'' can be used to compute the predicted state from the previous estimate and similarly the function ''h'' can be used to compute the predicted measurement from the predicted state. However, ''f'' and ''h'' cannot be applied to the covariance directly. Instead a matrix of partial derivatives (the [[Jacobian matrix|Jacobian]]) is computed.

At each timestep the Jacobian is evaluated with current predicted states. These matrices can be used in the Kalman filter equations. This process essentially linearizes the nonlinear function around the current estimate.

=== Unscented Kalman filter ===
When the state transition and observation models—that is, the predict and update functions <math>f</math> and <math>h</math>—are highly nonlinear, the extended Kalman filter can give particularly poor performance.<ref name="JU2004">{{cite journal
 | author1 = Julier, Simon J.
 | author2 = Uhlmann, Jeffrey K.
 | year = 2004
 | title = Unscented filtering and nonlinear estimation
 | journal = Proceedings of the IEEE
 | volume = 92
 | issue = 3
 | pages = 401–422
 | url = https://ieeexplore.ieee.org/document/1271397
 | doi=10.1109/JPROC.2003.823141
 | s2cid = 9614092
 }}</ref>
<ref name="JU97">{{cite book
 | author1 = Julier, Simon J.
 | author2 = Uhlmann, Jeffrey K.
 | year = 1997
 | title = Signal Processing, Sensor Fusion, and Target Recognition VI
 | volume = 3
 | pages = 182–193
 | chapter-url = http://www.cs.unc.edu/~welch/kalman/media/pdf/Julier1997_SPIE_KF.pdf
 | access-date = 2008-05-03
 | bibcode = 1997SPIE.3068..182J
 | doi=10.1117/12.280797
 | series=Proceedings of SPIE
 | citeseerx=10.1.1.5.2891
| chapter = New extension of the Kalman filter to nonlinear systems
 | s2cid = 7937456
 | editor1-last = Kadar
 | editor1-first = Ivan
 }}</ref> This is because the covariance is propagated through linearization of the underlying nonlinear model. The unscented Kalman filter (UKF)&nbsp;<ref name="JU2004" /> uses a deterministic sampling technique known as the [[unscented transform|unscented transformation (UT)]] to pick a minimal set of sample points (called sigma points) around the mean. The sigma points are then propagated through the nonlinear functions, from which a new mean and covariance estimate are formed. The resulting filter depends on how the transformed statistics of the UT are calculated and which set of sigma points are used. It should be remarked that it is always possible to construct new UKFs in a consistent way.<ref>{{Cite journal|last1=Menegaz|first1=H. M. T.|last2=Ishihara|first2=J. Y.|last3=Borges|first3=G. A.|last4=Vargas|first4=A. N.|date=October 2015|title=A Systematization of the Unscented Kalman Filter Theory|journal=IEEE Transactions on Automatic Control|volume=60|issue=10|pages=2583–2598|doi=10.1109/tac.2015.2404511|issn=0018-9286|hdl=20.500.11824/251|s2cid=12606055|hdl-access=free}}</ref>  For certain systems, the resulting UKF more accurately estimates the true mean and covariance.<ref name="GH2012">{{cite journal
 | author  = Gustafsson, Fredrik
 | author2 = Hendeby, Gustaf
 | year    = 2012
 | title   = Some Relations Between Extended and Unscented Kalman Filters
 | journal = IEEE Transactions on Signal Processing
 | volume  = 60
 | issue = 2
 | pages   = 545–555
 | doi     = 10.1109/tsp.2011.2172431
 | bibcode= 2012ITSP...60..545G
 | s2cid = 17876531
 | url = http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-75272
 }}</ref>  This can be verified with [[Monte Carlo sampling]] or [[Taylor series]] expansion of the posterior statistics.  In addition, this technique removes the requirement to explicitly calculate Jacobians, which for complex functions can be a difficult task in itself (i.e., requiring complicated derivatives if done analytically or being computationally costly if done numerically), if not impossible (if those functions are not differentiable).

==== Sigma points ====
For a [[Random variable|random]] vector <math>\mathbf{x}=(x_1, \dots, x_L)</math>, sigma points are any set of vectors
:<math> \{\mathbf{s}_0,\dots, \mathbf{s}_N \}=\bigl\{\begin{pmatrix} s_{0,1}& s_{0,2}&\ldots& s_{0,L} \end{pmatrix}, \dots, \begin{pmatrix} s_{N,1}& s_{N,2}&\ldots& s_{N,L} \end{pmatrix}\bigr\}</math>
attributed with

* first-order weights <math>W_0^a, \dots, W_N^a</math> that fulfill
# <math> \sum_{j=0}^N W_j^a=1 </math> 
# for all <math>i=1, \dots, L</math>: <math> E[x_i]=\sum_{j=0}^N W_j^a s_{j,i} </math>
* second-order weights <math>W_0^c, \dots, W_N^c</math> that fulfill 
# <math> \sum_{j=0}^N W_j^c=1 </math>  
# for all pairs <math> (i,l) \in \{1,\dots, L\}^2: E[x_ix_l]=\sum_{j=0}^N W_j^c s_{j,i}s_{j,l} </math>.

A simple choice of sigma points and weights for <math>\mathbf{x}_{k-1\mid k-1}</math> in the UKF algorithm is
:<math>\begin{align}
\mathbf{s}_0&=\hat \mathbf{x}_{k-1\mid k-1}\\
-1&<W_0^a=W_0^c<1\\
\mathbf{s}_j&=\hat \mathbf{x}_{k-1\mid k-1} + \sqrt{\frac{L}{1-W_0}} \mathbf{A}_j, \quad j=1, \dots, L\\
\mathbf{s}_{L+j}&=\hat \mathbf{x}_{k-1\mid k-1} - \sqrt{\frac{L}{1-W_0}} \mathbf{A}_j, \quad j=1, \dots, L\\
W_j^a&=W_j^c=\frac{1-W_0}{2L}, \quad j=1, \dots, 2L
\end{align}
</math>
where <math>\hat \mathbf{x}_{k-1\mid k-1}</math> is the mean estimate of <math>\mathbf{x}_{k-1\mid k-1}</math>. The vector <math>\mathbf{A}_j</math> is the ''j''th column of <math>\mathbf{A}</math> where <math>\mathbf{P}_{k-1\mid k-1}=\mathbf{AA}^\textsf{T}</math>. Typically, <math>\mathbf{A}</math> is obtained via [[Cholesky decomposition]] of <math>\mathbf{P}_{k-1\mid k-1}</math>. With some care the filter equations can be expressed in such a way that <math>\mathbf{A}</math> is evaluated directly without intermediate calculations of <math>\mathbf{P}_{k-1\mid k-1}</math>.  This is referred to as the ''square-root unscented Kalman filter''.<ref>{{cite book |last1=Van der Merwe |first1=R. |last2=Wan |first2=E.A. |title=2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221) |chapter=The square-root unscented Kalman filter for state and parameter-estimation |date=2001 |volume=6 |pages=3461–3464 |doi=10.1109/ICASSP.2001.940586|isbn=0-7803-7041-4 |s2cid=7290857 }}</ref>

The weight of the mean value, <math>W_0</math>, can be chosen arbitrarily.

Another popular parameterization (which generalizes the above) is 
:<math>\begin{align}
\mathbf{s}_0&=\hat \mathbf{x}_{k-1\mid k-1}\\
W_0^a&= \frac{\alpha^2\kappa-L}{\alpha^2\kappa}\\
W_0^c&= W_0^a + 1-\alpha^2+\beta \\
\mathbf{s}_j&=\hat \mathbf{x}_{k-1\mid k-1} + \alpha\sqrt{\kappa} \mathbf{A}_j, \quad j=1, \dots, L\\
\mathbf{s}_{L+j}&=\hat \mathbf{x}_{k-1\mid k-1} - \alpha\sqrt{\kappa} \mathbf{A}_j, \quad j=1, \dots, L\\
W_j^a&=W_j^c=\frac{1}{2\alpha^2\kappa}, \quad j=1, \dots, 2L.
\end{align}
</math>

<math>\alpha</math> and <math>\kappa</math> control the spread of the sigma points.  <math>\beta</math> is related to the distribution of <math>x</math>. Note that this is an overparameterization in the sense that any one of <math>\alpha</math>, <math>\beta</math> and <math>\kappa</math> can be chosen arbitrarily.

Appropriate values depend on the problem at hand, but a typical recommendation is <math>\alpha = 1</math>, <math>\beta = 0</math>, and <math>\kappa \approx 3L/2</math>.{{cn|date=January 2025}} If the true distribution of <math>x</math> is Gaussian, <math>\beta = 2</math> is optimal.<ref>{{Cite book |doi=10.1109/ASSPCC.2000.882463 |chapter-url=http://www.lara.unb.br/~gaborges/disciplinas/efe/papers/wan2000.pdf |chapter=The unscented Kalman filter for nonlinear estimation |title=Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373) |page=153 |year=2000 |last1=Wan |first1=E.A. |last2=Van Der Merwe |first2=R. |isbn=978-0-7803-5800-3 |citeseerx=10.1.1.361.9373 |s2cid=13992571 |access-date=2010-01-31 |archive-date=2012-03-03 |archive-url=https://web.archive.org/web/20120303020429/http://www.lara.unb.br/~gaborges/disciplinas/efe/papers/wan2000.pdf |url-status=dead }}</ref>

==== Predict ====
As with the EKF, the UKF prediction can be used independently from the UKF update, in combination with a linear (or indeed EKF) update, or vice versa.

Given estimates of the mean and covariance, <math> \hat\mathbf{x}_{k-1\mid k-1}</math> and <math>\mathbf{P}_{k-1\mid k-1}</math>, one obtains <math> N = 2L+1 </math> sigma points as described in the section above. The sigma points are propagated through the transition function ''f''.
:<math>\mathbf{x}_{j} = f\left(\mathbf{s}_{j}\right) \quad j = 0, \dots, 2L </math>.

The propagated sigma points are weighed to produce the predicted mean and covariance. 
:<math>\begin{align}
   \hat{\mathbf{x}}_{k \mid k-1} &= \sum_{j=0}^{2L} W_j^a \mathbf{x}_j \\
         \mathbf{P}_{k \mid k-1} &= \sum_{j=0}^{2L} W_j^c \left(\mathbf{x}_j - \hat{\mathbf{x}}_{k \mid k-1}\right)\left(\mathbf{x}_j - \hat{\mathbf{x}}_{k \mid k-1}\right)^\textsf{T}+\mathbf{Q}_k
\end{align}</math>
where <math>W_j^a</math> are the first-order weights of the original sigma points, and <math>W_j^c</math> are the second-order weights. The matrix <math> \mathbf{Q}_k </math> is the covariance of the transition noise, <math>\mathbf{w}_k</math>.

==== Update ====
Given prediction estimates <math>\hat{\mathbf{x}}_{k \mid k-1}</math> and <math>\mathbf{P}_{k \mid k-1}</math>, a new set of <math>N = 2L+1</math> sigma points <math>\mathbf{s}_0, \dots, \mathbf{s}_{2L}</math> with corresponding first-order weights <math> W_0^a,\dots W_{2L}^a</math> and second-order weights <math>W_0^c,\dots, W_{2L}^c</math> is calculated.<ref>{{cite journal |last1=Sarkka |first1=Simo |title=On Unscented Kalman Filtering for State Estimation of Continuous-Time Nonlinear Systems |journal=IEEE Transactions on Automatic Control |date=September 2007 |volume=52 |issue=9 |pages=1631–1641 |doi=10.1109/TAC.2007.904453}}</ref> These sigma points are transformed through the measurement function <math>h</math>.
:<math> \mathbf{z}_j=h(\mathbf{s}_j), \,\, j=0,1, \dots, 2L </math>.

Then the empirical mean and covariance of the transformed points are calculated.
:<math>\begin{align}
  \hat{\mathbf{z}} &= \sum_{j=0}^{2L} W_j^a \mathbf{z}_j \\[6pt]
  \hat{\mathbf{S}}_k &= \sum_{j=0}^{2L} W_j^c (\mathbf{z}_j-\hat{\mathbf{z}})(\mathbf{z}_j-\hat{\mathbf{z}})^\textsf{T} + \mathbf{R_k}
\end{align}</math>
where <math>\mathbf{R}_k</math> is the covariance matrix of the observation noise, <math>\mathbf{v}_k</math>. Additionally, the cross covariance matrix is also needed
:<math>\begin{align}
  \mathbf{C_{xz}} &= \sum_{j=0}^{2L} W_j^c (\mathbf{x}_j-\hat\mathbf{x}_{k|k-1})(\mathbf{z}_j-\hat\mathbf{z})^\textsf{T}.
\end{align}</math>

The Kalman gain is
: <math>\begin{align}
  \mathbf{K}_k=\mathbf{C_{xz}}\hat{\mathbf{S}}_k^{-1}.
\end{align}</math>

The updated mean and covariance estimates are
:<math>
\begin{align}
\hat\mathbf{x}_{k\mid k}&=\hat\mathbf{x}_{k|k-1}+\mathbf{K}_k(\mathbf{z}_k-\hat\mathbf{z})\\
\mathbf{P}_{k\mid k}&=\mathbf{P}_{k\mid k-1}-\mathbf{K}_k\hat{\mathbf{S}}_k\mathbf{K}_k^\textsf{T}.
\end{align}
</math>

=== Discriminative Kalman filter ===
When the observation model <math>p(\mathbf{z}_k\mid\mathbf{x}_k)</math> is highly non-linear and/or non-Gaussian, it may prove advantageous to apply [[Bayes' rule]] and estimate
:<math>
p(\mathbf{z}_k\mid\mathbf{x}_k) \approx 
\frac{p(\mathbf{x}_k\mid\mathbf{z}_k)}{p(\mathbf{x}_k)}
</math>
where <math>p(\mathbf{x}_k\mid\mathbf{z}_k) \approx \mathcal{N}(g(\mathbf{z}_k),Q(\mathbf{z}_k))</math> for nonlinear functions <math>g,Q</math>. This replaces the generative specification of the standard Kalman filter with a [[discriminative model]] for the latent states given observations.

Under a [[Stationary process|stationary]] state model
:<math>
\begin{align}
p(\mathbf{x}_1) &= \mathcal{N}(0, \mathbf{T}), \\
p(\mathbf{x}_k\mid\mathbf{x}_{k-1}) &= 
\mathcal{N}(\mathbf{F}\mathbf{x}_{k-1}, \mathbf{C}),
\end{align}
</math>
where <math>\mathbf{T}= \mathbf{F}\mathbf{T}\mathbf{F}^\intercal + \mathbf{C}</math>, if 
:<math>
p(\mathbf{x}_k\mid\mathbf{z}_{1:k}) \approx \mathcal{N}(\hat{\mathbf{x}}_{k|k-1}, \mathbf{P}_{k|k-1}),
</math>
then given a new observation <math>\mathbf{z}_k</math>, it follows that<ref name="Bur20">{{cite journal |last1=Burkhart |first1=Michael C. |last2=Brandman |first2=David M. |last3=Franco |first3=Brian |last4=Hochberg |first4=Leigh |last5=Harrison |first5=Matthew T. |title=The Discriminative Kalman Filter for Bayesian Filtering with Nonlinear and Nongaussian Observation Models |journal=Neural Computation |date=2020 |volume=32 |issue=5 |pages=969–1017 |doi=10.1162/neco_a_01275 |pmid=32187000 |pmc=8259355 |s2cid=212748230 |access-date=26 March 2021 | url=https://direct.mit.edu/neco/article/32/5/969/95592/The-Discriminative-Kalman-Filter-for-Bayesian}}</ref>
:<math>
p(\mathbf{x}_{k+1}\mid\mathbf{z}_{1:k+1}) \approx \mathcal{N}(\hat{ \mathbf{x}}_{k+1|k}, \mathbf{P}_{k+1|k})
</math>
where
:<math>
\begin{align}
\mathbf{M}_{k+1} &= \mathbf{F}\mathbf{P}_{k|k-1}\mathbf{F}^\intercal + \mathbf{C}, \\
\mathbf{P}_{k+1|k} &= (\mathbf{M}_{k+1}^{-1} + Q(\mathbf{z}_k)^{-1} - \mathbf{T}^{-1})^{-1}, \\
\hat{\mathbf{x}}_{k+1|k} &= \mathbf{P}_{k+1|k} (\mathbf{M}_{k+1}^{-1}\mathbf{F}\hat{\mathbf{x}}_{k|k-1} + \mathbf{P}_{k+1|k}^{-1}g(\mathbf{z}_k) ).
\end{align}
</math>
Note that this approximation requires <math> Q(\mathbf{z}_k)^{-1} - \mathbf{T}^{-1} </math> to be positive-definite; in the case that it is not, 
:<math>
\mathbf{P}_{k+1|k} = (\mathbf{M}_{k+1}^{-1} + Q(\mathbf{z}_k)^{-1})^{-1}
</math>
is used instead. Such an approach proves particularly useful when the dimensionality of the observations is much greater than that of the latent states<ref name="Bur19">{{cite thesis |last1=Burkhart |first1=Michael C. |title=A Discriminative Approach to Bayesian Filtering with Applications to Human Neural Decoding |date=2019 |publisher=Brown University |location=Providence, RI, USA |doi=10.26300/nhfp-xv22 }}</ref> and can be used build filters that are particularly robust to nonstationarities in the observation model.<ref name="Bra18">{{cite journal |last1=Brandman |first1=David M. |last2=Burkhart |first2=Michael C. |last3=Kelemen |first3=Jessica |last4=Franco |first4=Brian |last5=Harrison |first5=Matthew T. |last6=Hochberg |first6=Leigh R. |title=Robust Closed-Loop Control of a Cursor in a Person with Tetraplegia using Gaussian Process Regression |journal=Neural Computation |date=2018 |volume=30 |issue=11 |pages=2986–3008 |doi=10.1162/neco_a_01129 |pmid=30216140 |pmc=6685768 |url=https://direct.mit.edu/neco/article/30/11/2986/8418/Robust-Closed-Loop-Control-of-a-Cursor-in-a-Person |access-date=26 March 2021}}</ref>