Aran Komatsuzaki   @arankomatsuzaki   6/9/2021       

Hash Layers For Large Sparse Models Modifies FFN to hash to different sets of weights. Either outperforms or is competitive with MoE methods such as Switch Transformers, while requiring no routing parameters or extra terms in the objective function. https://t.co/O2oirI0iK7

 Reply  0     Retweet   4      Like   26





Posted by Aran Komatsuzaki