LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

RWKV v5 7b, Fully Open-Source, 60% trained, approaching Mistral 7b in abilities or surpassing it. (alien.top)

submitted 11 months ago by vatsadev@alien.top to c/localllama@poweruser.forum

32 comments fedilink hide all child comments

So RWKV 7b v5 is 60% trained now, saw that multilingual parts are better than mistral now, and the english capabilities are close to mistral, except for hellaswag and arc, where its a little behind. all the benchmarks are on rwkv discor, and you can google the pro/cons of rwkv, though most of them are v4.

Thoughts?

you are viewing a single comment's thread
view the rest of the comments

[–] Aaaaaaaaaeeeee@alien.top 1 points 11 months ago (3 children)

Would the amount of RAM used at the end of 16k or 32k compared to mistral be less?

Is the t/s the same speed as during the beginning?

Looks like something to test in kobold.cpp later if nobody has done those tests yet.

[–] vatsadev@alien.top 1 points 11 months ago

Thats the point of rwkv, you could have a 10 mil contx len and it would be the same as 100 ctx len

[–] artelligence_consult@alien.top 1 points 11 months ago (2 children)

SIGNIFICATNLY less - it is not a transformer that goes totally quadratic.

[–] involviert@alien.top 1 points 11 months ago (1 children)

It is not a transformer?

[–] Disastrous_Elk_6375@alien.top 1 points 11 months ago

Nope, RNN without attention, with some tricks for enabling parallel training.

[–] Aaaaaaaaaeeeee@alien.top 1 points 11 months ago

Its basically... 0?

From github:

More friendly than usual GPT. Because you don't need to keep a huge context (or kv cache). You just need the hidden state of the last single token.

[–] Aaaaaaaaaeeeee@alien.top 1 points 11 months ago

RWKV-4 7b does not increase any RAM usage with --nommap at 13k with koboldcpp. is that normal? Is there no kv-cache and no extra ram usage for context?