this post was submitted on 25 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Would the amount of RAM used at the end of 16k or 32k compared to mistral be less?
Is the t/s the same speed as during the beginning?
Looks like something to test in kobold.cpp later if nobody has done those tests yet.
SIGNIFICATNLY less - it is not a transformer that goes totally quadratic.
It is not a transformer?
Nope, RNN without attention, with some tricks for enabling parallel training.
Its basically... 0?
From github: