Jason Weston   @jaseweston   6/9/2021       

How can you improve a model -- add more parameters or add more compute? Both work! But the model design matters. Two new methods: - "Hash Layers" for more parameters - "Staircase Attention" for more power per parameter Read here: https://t.co/PP3Z9xSosj

 Reply  0     Retweet   60      Like   388





Posted by Jason Weston