Yeah it really does seem obvious once it's written up this clearly.
And honestly an explanation of LM as taking a Taylor series expansion of the difference in error is so spot on but literally nobody ever told me that's what it's doing.
We just derived from minimizing log error and slapped on a bit more gradient for funsies.
Oh no, the DNN community is rediscovering the same things oldschool linear estimators had way back when? My old LMS variants!
(And it turns out the algorithms are not that different... just higher dimensional.)
Yeah it really does seem obvious once it's written up this clearly.
And honestly an explanation of LM as taking a Taylor series expansion of the difference in error is so spot on but literally nobody ever told me that's what it's doing.
We just derived from minimizing log error and slapped on a bit more gradient for funsies.