When defining neural networks, even in a framework as concise as Keras, you often find yourself writing far too much gumpf.
Sometimes when I write reports, I end up summarizing backpropagation, which spurs me to derive it again for myself. Backpropagation isn’t particularly complicated mathematically, but if you’re not super comfortable applying the chain rule until your arm falls off, or combining calculus and linear algebra, it might feel a bit involved. Especially when my calculus is rusty, I occasionally go down some less sensible routes and take a while to get there – hopefully, writing this down will prevent me going the wrong way again. This is something of a crash-course in the derivation, without any of the diagrams or other niceties necessary to make it even slightly intelligible.