this post was submitted on 12 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

People talk about it around here like this is pretty simple (these days at least). But once I hit about 4200-4400 tokens (with my limit pushed to 8k) all I get is gibberish. This is with the LLaMA2-13B-Tiefighter-AWQ model, which seems highly regarded for roleplay/storytelling (my use case).

I also tried OpenHermes-2.5-Mistral-7B and it was nonsensical from the very start oddly enough.

I'm using Silly Tavern with Oobabooga, sequence length set to 8k in both, and a 3090. I'm pretty new to all of this and it's been difficult finding up to date information (because things develop so quickly!) The term fine-tuning comes up a lot, and with it comes a whooooole lot of complicated coding talk I know nothing about.

As a layman, is there a way to achieve 8k (or more) context for a roleplay/storytelling model?

you are viewing a single comment's thread
view the rest of the comments
[–] BangkokPadang@alien.top 1 points 10 months ago

For llama2 models set your alpha to 2.65 when loading them at 8k.

The general suggestion is “2.5” but if you plot the formula on a graph, 8192 context aligns with 2.642, so 2.65 is more accurate than 2.5