LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

How to train/finetune with long examples on dataset? (alien.top)

submitted 11 months ago by tgredditfc@alien.top to c/localllama@poweruser.forum

2 comments fedilink hide all child comments

I want to fine tune some LLM models with my own dataset which contains very long examples (a little > 2048 tokens). vRAM usage jumps up several GBs by just increasing the Cutoff Length from 512 to 1024.

Is there a way to feed those long examples into the models without increasing vRAM significantly?

top 2 comments

sorted by: hot top controversial new old

[–] andrewlapp@alien.top 1 points 11 months ago

VRAM scales quadratically as sequence length increases. I'm not aware of any solutions. Even efficient implementations of long context fine tuning such as LongLoRA only improve speed and quality, but leave memory usage the same as LoRA.

I recommend ensuring you're reducing memory in other ways:

Ensure you're using 4-bit QLoRA
Ensure batch size is 1
Ensure you're using FlashAttention-2
Ensure your optimizer state in in CPU memory by utilizing a paged optimizer.
Use gradient checkpointing.

You also could do something more experimental like employ Mistral with a sliding window of 1024 tokens to capture 2048 tokens of context while only using the memory of 1024 tokens.

Or you could just summarize or prune your long examples.

[–] wind_dude@alien.top 1 points 11 months ago

you can try changing the attention to something like flash attention