this post was submitted on 25 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Would the amount of RAM used at the end of 16k or 32k compared to mistral be less?
Is the t/s the same speed as during the beginning?
Looks like something to test in kobold.cpp later if nobody has done those tests yet.
Thats the point of rwkv, you could have a 10 mil contx len and it would be the same as 100 ctx len
SIGNIFICATNLY less - it is not a transformer that goes totally quadratic.
It is not a transformer?
Nope, RNN without attention, with some tricks for enabling parallel training.
Its basically... 0?
From github:
RWKV-4 7b does not increase any RAM usage with --nommap at 13k with koboldcpp. is that normal? Is there no kv-cache and no extra ram usage for context?