Original Post

To boost your intuition about the formula of cross-entropy (or neg-log-likelihood, as they confusingly call it). These series show up in eigendecomposition and quantum momentum exchange operators quite frequently. If you already know how to raise e to the power of a matrix, this thing will give the depth. And will explain better why we use LOTUS (the Law of unconscious statistician) "chain rule" when take the expectation operator of your y_hat distribution with respect to the training example target E_y[y_hat].

Links From the Original Post

https://youtube.com/watch?v=G0Fa5Zl-Z3c&si=WAnY_OIUK80VGAUn