gamesntech

joined 1 year ago
 

What are some good options these days for llm work - primarily for fine tuning and other related experiments? This for personal work and proof of concept type stuff and will be out of pocket so I’d definitely prefer cheaper options. I’d mostly be using 7-13b models but later would want to test with larger models as well.

Most of the providers have on demand and spot options, with spot options being obviously cheaper. I understand the spot instances can go down at any time but assuming checkpoints are saved regularly and can resume later that shouldn’t be a big problem. Are there any gotchas here?

The other criteria is managed/secure environment vs some kind of open/community environment. Again the later options are cheaper and assuming security is not a major requirement that seems like the better choice. Any thoughts or advice on this one?

I’m mostly looking at runpod, vast, and replicate based on info from other threads. Are there any other providers folks had good experience with?

How do AWS, GCP, or Azure compare to these options? From what I can tell these seem more expensive but I haven’t looked at these too closely.

Any recommendations with some details on your own experience, use cases, and costs would be greatly appreciated.

[–] gamesntech@alien.top 1 points 11 months ago

I’ll be honest, this question and the answers here are a classic example of llm promoting. What would be very useful is some examples of what you tried and what challenges you faced with those trials so people can give more informed and targeted advice.

[–] gamesntech@alien.top 1 points 11 months ago (1 children)

This is very interesting and quite helpful. I wouldn’t think to provide such detailed instructions. I’d love to see your full system prompt if that’s possible.

 

I'm running a simple finetune of llama-2-7b-hf mode with the guanaco dataset. A test run with batch size of 2 and max_steps 10 using the hugging face trl library (SFTTrainer) takes a little over 3 minutes on Colab Free. But the same script is running for over 14 minutes using RTX 4080 locally. I'm running this under WSL with full CUDA support. The GPU is utilized 100% during training so I don't think that's a problem here.

Is there anything I'm missing or overlooking? The script itself is pretty simple and straight forward but for reference I'm using this version. The code is using bitsandbytes, 4bit loading, nf4 quant, float16, all standard stuff.

https://github.com/Vasanthengineer4949/NLP-Projects-NHV/blob/main/LLMs%20Related/Finetune%20Llama2%20using%20QLoRA/Finetune_LLamA.ipynb

Any help is appreciated.