Original Post
To boost your intuition about the formula of cross-entropy (or neg-log-likelihood, as they confusingly call it). These series show up in eigendecomposition and quantum momentum exchange operators quite frequently. If you already know how to raise e to the power of a matrix, this thing will give the depth. And will explain better why we use LOTUS (the Law of unconscious statistician) "chain rule" when take the expectation operator of your y_hat distribution with respect to the training example target E_y[y_hat].