Are you doing a full finetune?
Try a LoRA, or better yet LongLora which is specifically optimized for long context: https://github.com/huggingface/peft/issues/958
Community to discuss about Llama, the family of large language models created by Meta AI.
Are you doing a full finetune?
Try a LoRA, or better yet LongLora which is specifically optimized for long context: https://github.com/huggingface/peft/issues/958
Hi u/mcmoose1900 thanks a lot for the reply!
By my understanding, i already make use of peft and lora since starting this endeavour.
See excerpts of the code here (there is a chance that maybe it does not get used as intended due to the often weird ways Python works).
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
load_in_8bit=False,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16
)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
and here
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.2,
r=64,
bias="none",
task_type="CAUSAL_LM",
)
max_seq_length = MAX_SEQ_LENGTH
trainer = SFTTrainer(
model=base_model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
peft_config=peft_config,
formatting_func=formatting_func,
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_args,
)
and the parameters
MAX_SEQ_LENGTH = 8192
LEARNING_RATE = 2e-5
PER_DEVICE_BATCH_SIZE = 1
GRADIENT_ACCUMULATION_STEPS = 1
USE_EVAL = True
QUANT_BIT_8 = False
QUANT_BIT_4 = not QUANT_BIT_8
The numbers above are very low as i tried lowering them to mitigate the OOM issue without success. Normally they would not make sense.