Always telling this whenever the topic of Kalman Filters come up:
If you're learning the Kalman Filter in isolation, you're kind of learning it backwards and missing out on huge "aha" moments that the surrounding theory can unlock.
To truly understand the Kalman Filter, you need to study Least Squares (aka linear regression), then recursive Least Squares, then the Information Filter (which is a different formulation of the KF).
Then you'll realize the KF is just recursive Least Squares reformulated in a way to prioritize efficiency in the update step.
I appreciate you taking the time to help people understand higher level concepts.
From a different perspective... I have no traditional background in mathematics or physics. I do not understand the first line of the pdf you posted nor do I understand the process for obtaining the context to understand it.
But I have intellectual curiosity. So the best path forward for me understanding is a path that can maintain that curiosity while making progress on understanding. I can reread the The Six (Not So ) Easy Pieces and not understand any of it and still find value in it. I can play with Arnold's cat and, slowly, through no scientific rigor other than the curiosity of the naked ape, I can experience these concepts that have traditionally been behind gates of context I do not possess keys to.
I think the easiest way depends on your background knowledge. If you understand the linearity of the Gaussian distribution and the Bayesian posterior of Gaussians, the Kalman filter is almost trivial.
For (1D) we get the prior from the linear prediction X'1 = X0*a + b, for which mean(X'1) = mean(X0)*a + b and var(X'1) = var(X0)*a, where a and b give the assumed dynamics.
The posterior for Gaussians is the precision weighted mean of the prior and the observation likelihood: X1 = (1 - K)*X'1 + Y*K, where the weighting K = (1/var(X'1))/(1/var(X'1) + 1/var(Y)), where Y is the Gaussian observation (with zero mean for exposition).
Iterating this gives the Kalman filter. Generalizing this to multiple dimensions is straightforward given the linearity of multidimensional Gaussians.
This is how (after I understood it) it makes it really simple to me, but things like linearity of (multidimensional) Gaussians and Gaussian posterior as such probably are not.
It's bread and butter math for physics, Engineering (trad. Engineering), Geophysics, Signal processing etc.
Why would anyone have people implementing Kalman filters who found the math behind them "esoteric"?
Back in the day, in my wet behind the ears phase, my first time implementing a Kalman Filter from scratch, the application was to perform magnetic heading normalisation for on mag data from an airborne geophysical survey - 3 axis nanotesla sensor inputs on each wing and tail boom requiring a per survey calibration pattern to normalise the readings over a fixed location regardless of heading.
This was buried as part of a suite requiring calculation of the geomagnetic reference field (a big paramaterised spherical harmonic equation), upward, downward and reduce topole continuations of magnetic field equations, raw GPS post processing corrections, etc.
where "etc" goes on for a shelf full of books with a dense chunk of applied mathematics
I understood it as reestimation with a dynamic weight factor based on the perceived error factor. I know it’s more complex than that but this simplified version I needed at one point and it worked.
Something occurred to me a while back: Can we treat events that only have eyewitness testimony with a Kalman filter somehow in order to strengthen the evidential value of the observations after encoding it into vectors of some sort?
This would treat both lying and inaccuracy as "error"
I'm thinking of things like: reports of Phoenix lights or UFOs in general, ghosts, NDEs, and more prosaically, claims of rape
“Kalman filter” usually refers to “linear quadratic estimator”, which assumes a linear model in its derivation. This will impact the “predict“ step at the very least, and I think also the way the uncertainty propagates. There are nonlinear estimators as well, though they usually have less-nice guarantees (eg particle filter, extended kalman filter)
Edit: in fact, I see part three of the book in tfa is devoted to nonlinear Kalman filters. I suspect some of the crowd (myself included) just assumed we were talking about linear Kalman filters
As far as I am aware, there is no symbolic computing tool yet for probability distributions? For example, multiplying two multivariate Gaussian PDFs together and getting the covariance matrix out. Or defining all the ingredients for a Kalman filter (prediction model and observing process) and getting the necessary formulas out (as in sympy's lambdify).
The first example of tracking, is this the same thing as dead reckoning? I've always been confused on the term "tracking" since it is used a lot in common speech, but seems to mean some specific type of 'tracking'
"Tracking", here, means providing some kind of `f(time) -> space` API.
Dead reckoning is a mechanism for incorporating velocity and whatnot into a previously estimated position to estimate a new position (and is also one possible way to implement tracking, usually with compounding errors).
The Kalman filter example is better than just dead reckoning. For a simple example, imagine you're standing still but don't know exactly where. You have an API (like GPS) that can estimate your current position within some tolerance. If you're able to query that API repeatedly and the errors aren't correlated, you can pinpoint your location much more precisely.
Back to tracking with non-zero velocity, every new position estimate (e.g., from GPS) can be incorporated with all the information you've seen so far, adjusting your estimates of velocity, acceleration, and position and giving you a much more accurate current estimate but also better data for dead-reckoning estimates while you wait for your next external signal.
The technique (Kalman Filter) is pretty general. It's just merging all your noisy sources of information according to some ruleset (real-world physics being a common ruleset). You can tack on all sorts of other interesting information, like nearby wifi signals or whatever, and even very noisy signals can aggregate to give precise results.
Another application I threw it at once was estimating my true weight, glycogen reserves, ..., from a variety of noisy measurements. The sky's the limit. You just need multiple measurements and a rule for how they interact.
Dead reckoning is a form of prediction, based on past evidence that indicates location then, you are reckoning (best guessing) a current position and detrmining a direction to move forward to reach some target.
"Past evidence that indicates" is deliberate phrasing, in the majority of these examples we are looking at acquired data with noise; errors, instrument noise, missing returns, etc.
"Tracking" is multi-stage, there's a desired target to be found (or to be declared absent) in noisy data .. that's pattern search and locking, the trajectory (the track) of that target must be best guessed, and the best guess forward prediction can be used to assist the search for the target in a new position.
This is not all that can be done with a Kalman filter but it's typical of a class of common applications.
"The filter is named after Rudolf E. Kálmán (May 19, 1930 – July 2, 2016). In 1960, Kálmán published his famous paper describing a recursive solution to the discrete-data linear filtering problem."
Always telling this whenever the topic of Kalman Filters come up:
If you're learning the Kalman Filter in isolation, you're kind of learning it backwards and missing out on huge "aha" moments that the surrounding theory can unlock.
To truly understand the Kalman Filter, you need to study Least Squares (aka linear regression), then recursive Least Squares, then the Information Filter (which is a different formulation of the KF). Then you'll realize the KF is just recursive Least Squares reformulated in a way to prioritize efficiency in the update step.
This PDF gives a concise overview:
[1] http://ais.informatik.uni-freiburg.de/teaching/ws13/mapping/...
I appreciate you taking the time to help people understand higher level concepts.
From a different perspective... I have no traditional background in mathematics or physics. I do not understand the first line of the pdf you posted nor do I understand the process for obtaining the context to understand it.
But I have intellectual curiosity. So the best path forward for me understanding is a path that can maintain that curiosity while making progress on understanding. I can reread the The Six (Not So ) Easy Pieces and not understand any of it and still find value in it. I can play with Arnold's cat and, slowly, through no scientific rigor other than the curiosity of the naked ape, I can experience these concepts that have traditionally been behind gates of context I do not possess keys to.
http://gerdbreitenbach.de/arnold_cat/cat.html
I think the easiest way depends on your background knowledge. If you understand the linearity of the Gaussian distribution and the Bayesian posterior of Gaussians, the Kalman filter is almost trivial.
For (1D) we get the prior from the linear prediction X'1 = X0*a + b, for which mean(X'1) = mean(X0)*a + b and var(X'1) = var(X0)*a, where a and b give the assumed dynamics.
The posterior for Gaussians is the precision weighted mean of the prior and the observation likelihood: X1 = (1 - K)*X'1 + Y*K, where the weighting K = (1/var(X'1))/(1/var(X'1) + 1/var(Y)), where Y is the Gaussian observation (with zero mean for exposition).
Iterating this gives the Kalman filter. Generalizing this to multiple dimensions is straightforward given the linearity of multidimensional Gaussians.
This is how (after I understood it) it makes it really simple to me, but things like linearity of (multidimensional) Gaussians and Gaussian posterior as such probably are not.
You can keep telling this, but this “esoteric” math is often too much for the people actually implementing the filters.
It's bread and butter math for physics, Engineering (trad. Engineering), Geophysics, Signal processing etc.
Why would anyone have people implementing Kalman filters who found the math behind them "esoteric"?
Back in the day, in my wet behind the ears phase, my first time implementing a Kalman Filter from scratch, the application was to perform magnetic heading normalisation for on mag data from an airborne geophysical survey - 3 axis nanotesla sensor inputs on each wing and tail boom requiring a per survey calibration pattern to normalise the readings over a fixed location regardless of heading.
This was buried as part of a suite requiring calculation of the geomagnetic reference field (a big paramaterised spherical harmonic equation), upward, downward and reduce topole continuations of magnetic field equations, raw GPS post processing corrections, etc.
where "etc" goes on for a shelf full of books with a dense chunk of applied mathematics
I understood it as reestimation with a dynamic weight factor based on the perceived error factor. I know it’s more complex than that but this simplified version I needed at one point and it worked.
That’s the one should learn any subject—-be it physics, chemistry, math, etc. However, textbooks don’t follow that technique.
I strongly recommend Elements of Physics by Millikan and Gale for anyone who wants to learn pre-quantum physics this way.
You are probably right, but many folks following your advice will give up halfway through and never get to KF.
Every time that one comes up, this one comes up https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Pyt... (and vice versa)
Something occurred to me a while back: Can we treat events that only have eyewitness testimony with a Kalman filter somehow in order to strengthen the evidential value of the observations after encoding it into vectors of some sort?
This would treat both lying and inaccuracy as "error"
I'm thinking of things like: reports of Phoenix lights or UFOs in general, ghosts, NDEs, and more prosaically, claims of rape
Only if you can make a linear model of those things…
Why does the model need to be linear?
“Kalman filter” usually refers to “linear quadratic estimator”, which assumes a linear model in its derivation. This will impact the “predict“ step at the very least, and I think also the way the uncertainty propagates. There are nonlinear estimators as well, though they usually have less-nice guarantees (eg particle filter, extended kalman filter)
Edit: in fact, I see part three of the book in tfa is devoted to nonlinear Kalman filters. I suspect some of the crowd (myself included) just assumed we were talking about linear Kalman filters
As far as I am aware, there is no symbolic computing tool yet for probability distributions? For example, multiplying two multivariate Gaussian PDFs together and getting the covariance matrix out. Or defining all the ingredients for a Kalman filter (prediction model and observing process) and getting the necessary formulas out (as in sympy's lambdify).
Anyone else watch Michael van Biezem (with the bow tie) lectures on Kalman Filters while learning this topic?
- https://www.youtube.com/watch?v=CaCcOwJPytQ&list=PLX2gX-ftPV...
Related. Others?
Kalman filter from the ground up - https://news.ycombinator.com/item?id=37879715 - Oct 2023 (150 comments)
(also what's the best year to put in the title above?)
The first example of tracking, is this the same thing as dead reckoning? I've always been confused on the term "tracking" since it is used a lot in common speech, but seems to mean some specific type of 'tracking'
Kind of.
"Tracking", here, means providing some kind of `f(time) -> space` API.
Dead reckoning is a mechanism for incorporating velocity and whatnot into a previously estimated position to estimate a new position (and is also one possible way to implement tracking, usually with compounding errors).
The Kalman filter example is better than just dead reckoning. For a simple example, imagine you're standing still but don't know exactly where. You have an API (like GPS) that can estimate your current position within some tolerance. If you're able to query that API repeatedly and the errors aren't correlated, you can pinpoint your location much more precisely.
Back to tracking with non-zero velocity, every new position estimate (e.g., from GPS) can be incorporated with all the information you've seen so far, adjusting your estimates of velocity, acceleration, and position and giving you a much more accurate current estimate but also better data for dead-reckoning estimates while you wait for your next external signal.
The technique (Kalman Filter) is pretty general. It's just merging all your noisy sources of information according to some ruleset (real-world physics being a common ruleset). You can tack on all sorts of other interesting information, like nearby wifi signals or whatever, and even very noisy signals can aggregate to give precise results.
Another application I threw it at once was estimating my true weight, glycogen reserves, ..., from a variety of noisy measurements. The sky's the limit. You just need multiple measurements and a rule for how they interact.
Dead reckoning is a form of prediction, based on past evidence that indicates location then, you are reckoning (best guessing) a current position and detrmining a direction to move forward to reach some target.
"Past evidence that indicates" is deliberate phrasing, in the majority of these examples we are looking at acquired data with noise; errors, instrument noise, missing returns, etc.
"Tracking" is multi-stage, there's a desired target to be found (or to be declared absent) in noisy data .. that's pattern search and locking, the trajectory (the track) of that target must be best guessed, and the best guess forward prediction can be used to assist the search for the target in a new position.
This is not all that can be done with a Kalman filter but it's typical of a class of common applications.
The one sentence you really need to know:
"The filter is named after Rudolf E. Kálmán (May 19, 1930 – July 2, 2016). In 1960, Kálmán published his famous paper describing a recursive solution to the discrete-data linear filtering problem."
Not sure why, but I get this vague notion that the author might have written a book.