LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

40x or more speedup by selecting important neurons (alien.top)

submitted 11 months ago by koehr@alien.top to c/localllama@poweruser.forum

15 comments fedilink hide all child comments

https://arxiv.org/abs/2311.10770

"UltraFastBERT", apparently a variant of BERT, that uses only 0.3% of it's neurons during inference, is performing on par with similar BERT models.

I hope that's going to be available for all kinds of models in the near future!

you are viewing a single comment's thread
view the rest of the comments

[–] obwohl@alien.top 1 points 11 months ago (1 children)

Does this technique affect the required RAM-size for inference?

[–] koehr@alien.top 1 points 11 months ago (1 children)

I don't think so (unfortunately). The model size doesn't change, only the way it is traversed.

[–] obwohl@alien.top 1 points 11 months ago

Can this technique be combined with lora with a not so low rank? Lora increases the learning time (I heard) but this should be no problem then anymore :)