this post was submitted on 10 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

Hello!

By popular demand I am planning a fine-tune of https://huggingface.co/dreamgen/opus-v0-7b on top of Yi-34B and wonder whether to use the 200K as the base.

The regular Yi-34B seems slightly better than Yi-34B-200K on standard benchmarks, but I wonder how it "feels" and whether the loss of performance on short context is worth it, given that the regular version can be used up to 32K tokens.

(Yi-34B vs Yi-34B-200K)

Did anyone try an analysis of these 2 models on various sequence lengths (<4K, <8K, <16K, etc.)?

โ€‹

you are viewing a single comment's thread
view the rest of the comments
[โ€“] mcmoose1900@alien.top 1 points 1 year ago

Random update on this, I did some more experimenting on the start of a story (with LimaRP and Petrol LoRAs), and the 4K model seems... fine? So does the 200K.

I don't how know to stretch out the base model. Their page claims it supports 32K, but it has a 4K context in the config and no RoPE scaling section. Just a high rope theta.

The one difference I did notice is that the 200K model really likes to summarize and reference previous parts of the story. Maybe it was trained on retrieval or summarization examples.