LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Yi-34B vs Yi-34B-200K on sequences <32K and <4K (alien.top)

submitted 1 year ago by DreamGenX@alien.top to c/localllama@poweruser.forum

16 comments fedilink hide all child comments

Hello!

By popular demand I am planning a fine-tune of https://huggingface.co/dreamgen/opus-v0-7b on top of Yi-34B and wonder whether to use the 200K as the base.

The regular Yi-34B seems slightly better than Yi-34B-200K on standard benchmarks, but I wonder how it "feels" and whether the loss of performance on short context is worth it, given that the regular version can be used up to 32K tokens.

(Yi-34B vs Yi-34B-200K)

Did anyone try an analysis of these 2 models on various sequence lengths (<4K, <8K, <16K, etc.)?

you are viewing a single comment's thread
view the rest of the comments

[–] dogesator@alien.top 1 points 1 year ago (3 children)

It referring to itself as a GPT could just be from pre-training internet data if it was trained on internet data from 2023.

[–] BlueMetaMind@alien.top 1 points 11 months ago (2 children)

It sounds rather like it trained on chatGPT output and they didn't curate it enough to delete those "As a large language model trained by openAI..." category statements.

It's kinda like Shutterstock watermarks showing up in image generation.

[–] dogesator@alien.top 1 points 11 months ago (1 children)

Yea I’m saying that ChatGPT outputs are contained on internet posts in the year 2023, so simply training from 2023 internet data would end up with training on ChatGPT data as a side effect.

[–] BlueMetaMind@alien.top 1 points 11 months ago

Yes, I understood you. My claim differs in that I think they DIRECTLY used a lot of GPT4 output through the api, which is very probable because a lot of LLM training is done that way. You ask GPT4 to generate examples of conversations with properties you want your LLM to learn and then train on that.

In order for self identification, as GPT I don’t think that randomly crawled chat Examples from the Internet would be enough.

I am not trying to make a strong claim on that, it’s just a thought. My people both.