Original Post

In 2010, Martens invented deep Hessian-free methods. In 2011, Sutskever collaborated with Martens to apply it successfully to arbitrary RNNs (not special LSTMs, GRUs etc.)
In 2020, NERSC, Lawrence Berkeley National Laboratory, publish applications of combining the method with AdamW. What happened to the public who are so obsessed with Transformer? Why anyone stopped talking about the method invented by Sutskever and Martens 12 years ago?

https://lnkd.in/gUvuya3S

Links From the Original Post

https://lnkd.in/gUvuya3S
https://arxiv.org/abs/2006.00719