LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Yi-34B vs Yi-34B-200K on sequences <32K and <4K (alien.top)

submitted 2 years ago by DreamGenX@alien.top to c/localllama@poweruser.forum

16 comments fedilink hide all child comments

Hello!

By popular demand I am planning a fine-tune of https://huggingface.co/dreamgen/opus-v0-7b on top of Yi-34B and wonder whether to use the 200K as the base.

The regular Yi-34B seems slightly better than Yi-34B-200K on standard benchmarks, but I wonder how it "feels" and whether the loss of performance on short context is worth it, given that the regular version can be used up to 32K tokens.

(Yi-34B vs Yi-34B-200K)

Did anyone try an analysis of these 2 models on various sequence lengths (<4K, <8K, <16K, etc.)?

you are viewing a single comment's thread
view the rest of the comments

[–] a_beautiful_rhind@alien.top 1 points 2 years ago (2 children)

The regular 34b "feels" like it ignores my prompt a lot.

[–] mcmoose1900@alien.top 1 points 2 years ago

I felt this too. It seems to "grab on" when you give it a longer context to continue though.

[–] FullOf_Bad_Ideas@alien.top 1 points 2 years ago

It's supposed to be a base model and not Instruction finetuned model. That's how base models generally behave unless they are sold as base but actually finetuned (llama 2 base models).