Yann LeCun   @ylecun   6/9/2021       

New architectural concepts for (very) large NLP models: - Hash mixture of experts. - Staircase transformers. Allow to disentangle number of parameters from computational complexity. Great performance on standard benchmarks.

 Reply  0     Retweet   31      Like   177





Posted by Yann LeCun