cstein123

joined 2 years ago

Are LLMs at a practical limit for layer stacking? [D] in c/machinelearning@academy.garden

[–] cstein123@alien.top 1 points 2 years ago (1 children)

Exactly the answer I was looking for, thank you!

permalink
fedilink
source
context

Are LLMs at a practical limit for layer stacking? [D] (alien.top)

submitted 2 years ago by cstein123@alien.top to c/machinelearning@academy.garden

4 comments fedilink

Can LLMs stack more layers than the largest ones currently have, or is it bottlenecked? Is it because the gradients can’t propagate properly to the beginning of the network? Because inference would be to slow?

If anyone could provide a paper that talks about layer stacking scaling I would love to read it!