Attention, Dynamic Weights, and Mutual Information
July 21, 2023

Archived from an original LinkedIn post by Brian Greenforest.

Original Post

Of course, attention came from Gaussian mixture models and density networks, soft windows, that was realizing the concept of dynamic weights--historically--as evangelized by Hinton since the 1980s, and we just started to re-understand why we use Transformer at the first place? The concept of mutual information is the best upgrade for quantum mechanics that came from rather sloppy and surprising origin in computational linguistics (you'd expect Shannon at least, or rigorous mathematics?) :-)

https://lnkd.in/g_auZKeZ

Links From the Original Post