this post was submitted on 23 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

Like many of you, I often need to train LLMs (Large Language Models). Code hops from one project to another, and it's easy to lose track, resulting in several iterations of the same training process.

X—LLM is a solution. It’s a streamlined, user-friendly library designed for efficient model training, offering advanced techniques and customizable options within the Hugging Face ecosystem.

Features:
- LoRA, QLoRA and fusing
- Flash Attention 2
- Gradient checkpointing
- bitsandbytes quantization
- GPTQ (including post-training quantization)
- W&B experiment tracking
- Simple training on multiple GPUs at once using DeepSpeed or FSDP

Use cases:
- Create production-ready solutions or fast prototypes. X—LLM works in both configurations
- Finetune a 7B model with 334 million tokens (1.1 million dialogues) for just 50$
- Automatically save each checkpoint during training to the Hugging Face Hub and don't lose any progress
- Quantize a model using GPTQ. Reduce 7B Mistral model from 15 GB to 4.3 GB and increase inference speed

Github repo: https://github.com/BobaZooba/xllm

You can train 7B model, fuse LoRA and upload ready-to-use model to the Hugging Face Hub. All in a single Colab! Link

The library has gained 100 stars in less than a day, and now it's almost at 200. People are using it, training models in both Colab and multi-GPU setups. Meanwhile, I'm supporting X—LLM users and currently implementing the most requested feature - DPO.

Code example

I suggest that you try training your own models and see for yourself how simple it is.

If you like it, please consider giving the project a star on GitHub.

top 23 comments
sorted by: hot top controversial new old
[–] aikakysymys@alien.top 1 points 11 months ago (1 children)

This project has gained popularity in the LLM training community, check it out.

Thanks, Boris!

[–] DesperatePresence473@alien.top 1 points 11 months ago
[–] WrapKey69@alien.top 1 points 11 months ago (1 children)

Would it be possible to fine tune Mistral on free tier of Google colab like this? Even if it takes let's say 2x longer?

[–] DesperatePresence473@alien.top 1 points 11 months ago

Yeah, sure! That’s really easy. Just check this tutorial: https://colab.research.google.com/drive/1CNNB_HPhQ8g7piosdehqWlgA30xoLauP

It’s covered data preparation, training and saving trained model to the Hugging Face hub

Then you will be able to load your model as follows:

model = AutoModelForCausalLM(“WrapKey69/MySupaDupaMistral”)

[–] LyPreto@alien.top 1 points 11 months ago (2 children)

I rly wish MPS was more widely adopted by now… hate seeing just CUDA or CPU in all these new libraries

[–] Infamous-Bank-7739@alien.top 1 points 11 months ago (1 children)

You think people should prefer Mac over more general hw?

[–] LyPreto@alien.top 1 points 11 months ago (1 children)

not prefer it bur recognize its user base— metal + the unified memory have a lot to offer and the compute is there.. there just rly no adoption other than a few select projects like llama.cpp and some of the other text-inferencing engines.

[–] Infamous-Bank-7739@alien.top 1 points 11 months ago

Well it's going to be one before the other on small projects always. Do you have any good experience on the apple hw then? I can see the benefits of faster memory, but it needs to prove its worth for people to give it any actual attention.

[–] Crafty-Run-6559@alien.top 1 points 11 months ago (1 children)

Any idea what the vram requirements are for locally training a 7b qlora?

[–] DesperatePresence473@alien.top 1 points 11 months ago (1 children)

I strongly recommend training on a GPU, as it speeds up the training process by an order of magnitude and has become the standard. I can recommend services that offer GPU rentals at the lowest prices.
https://vast.ai
https://www.runpod.io
https://www.tensordock.com

[–] Crafty-Run-6559@alien.top 1 points 11 months ago (1 children)

Ah, OK- but what about a setup with dual local 3090s?

What kind of gpu rental would you recommend? An a100 80gb?

[–] DesperatePresence473@alien.top 1 points 11 months ago (1 children)

I apologize, I've confused you. At first, I read RAM and thought that you wanted to train on the CPU.
Of course, 2 x 3090 would be more than enough for training. I believe even a 13B model with a large context length could be trained.
If you have 2 GPUs, I suggest training through the command line and utilizing DeepSpeed or FSDP (which has been tested less).
Here are examples of projects where it's explained in detail how you can train:
https://github.com/BobaZooba/xllm-demo
https://github.com/BobaZooba/wgpt
On Twitter, one person unknown to me posted about how they easily managed to train on multi-gpu (a super simple and short example):

https://twitter.com/darrenangle/status/1724913070105841806

[–] Crafty-Run-6559@alien.top 1 points 11 months ago (2 children)

Awesome thank you.

Last question! Would it be reasonable to train on a single 3090 following that guide as well?

Edit: train a 7b on single

[–] DesperatePresence473@alien.top 1 points 11 months ago

And feel free to ask! I'm just here to help you

[–] DesperatePresence473@alien.top 1 points 11 months ago (1 children)

It depends on how deeply you want to immerse yourself. The library is intended for both rapid prototyping and production-ready development. I would recommend starting with the former, it's very simple and will take about 10-15 minutes to get started, not including training time.
Here is a notebook that allows you to train models on a single GPU:

https://colab.research.google.com/drive/1CNNB_HPhQ8g7piosdehqWlgA30xoLauP
You can download it and train your model locally on your computer.

[–] Crafty-Run-6559@alien.top 1 points 11 months ago

Thank you so much, this is awesome.

[–] clamuu@alien.top 1 points 11 months ago (1 children)

Is this something that could be trained on a laptop without a gpu or would it be better to use cloud based GPU services?

Also can you or anyone else recommend any other libraries which simplify LLM training? I've done some ML projects but I'd like to do something a bit deeper and this looks perfect.

I tried out Talequest by the way. Very impressive.

[–] DesperatePresence473@alien.top 1 points 11 months ago (1 children)

I strongly recommend training on a GPU, as it speeds up the training process by an order of magnitude and has become the standard. I can recommend services that offer GPU rentals at the lowest prices.

https://vast.ai

https://www.runpod.io

https://www.tensordock.com

Regarding the competitor libraries, I'm unlikely to be able to recommend anything specific. I created this particular library to simplify training on multi-GPU and prototyping, as well as to provide extensive customization options, including modifying the architecture, as is done in LoRA.

Thank you very much for your feedback on Tale Quest. It is very valuable to me, and I plan to further develop it someday. I would appreciate it if you continue to share your feedback. And I wanted to ask right away: is Telegram a popular app where you live? I am very concerned that Telegram might not be widespread enough for a full-fledged launch.

[–] clamuu@alien.top 1 points 11 months ago

Wow thank you for the detailed reply. Your library looks fantastic. I'm definitely going to give it a go. I'm going to try fine-tuning it on music theory. Is that a crazy idea? Training on a GPU sounds much better. I looked more thoroughly through the repo and found it's all explained in there.

Telegram is a popular app here in the UK. Seems to me like an excellent way to launch it as there's no need for the user to download an app. WhatsApp is much more popular here but maybe it's harder to deploy a bot like this on WhatsApp?

[–] TotesMessenger@alien.top 1 points 11 months ago

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 ^(If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.) ^(Info ^/ ^Contact)

[–] DadBod_FatherFigure@alien.top 1 points 11 months ago

Does this support training models from scratch assuming you can provide a tokenizer and a model configuration?