Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[P] X—LLM: Few lines of code to train your own 7B LLM in Colab using cutting edge techniques like QLoRA (alien.top)

submitted 2 years ago by DesperatePresence473@alien.top to c/machinelearning@academy.garden

23 comments fedilink hide all child comments

Like many of you, I often need to train LLMs (Large Language Models). Code hops from one project to another, and it's easy to lose track, resulting in several iterations of the same training process.

X—LLM is a solution. It’s a streamlined, user-friendly library designed for efficient model training, offering advanced techniques and customizable options within the Hugging Face ecosystem.

Features:
- LoRA, QLoRA and fusing
- Flash Attention 2
- Gradient checkpointing
- bitsandbytes quantization
- GPTQ (including post-training quantization)
- W&B experiment tracking
- Simple training on multiple GPUs at once using DeepSpeed or FSDP

Use cases:
- Create production-ready solutions or fast prototypes. X—LLM works in both configurations
- Finetune a 7B model with 334 million tokens (1.1 million dialogues) for just 50$
- Automatically save each checkpoint during training to the Hugging Face Hub and don't lose any progress
- Quantize a model using GPTQ. Reduce 7B Mistral model from 15 GB to 4.3 GB and increase inference speed

Github repo: https://github.com/BobaZooba/xllm

You can train 7B model, fuse LoRA and upload ready-to-use model to the Hugging Face Hub. All in a single Colab! Link

The library has gained 100 stars in less than a day, and now it's almost at 200. People are using it, training models in both Colab and multi-GPU setups. Meanwhile, I'm supporting X—LLM users and currently implementing the most requested feature - DPO.

Code example

I suggest that you try training your own models and see for yourself how simple it is.

If you like it, please consider giving the project a star on GitHub.

you are viewing a single comment's thread
view the rest of the comments

[–] WrapKey69@alien.top 1 points 2 years ago (1 children)

Would it be possible to fine tune Mistral on free tier of Google colab like this? Even if it takes let's say 2x longer?

[–] DesperatePresence473@alien.top 1 points 2 years ago

Yeah, sure! That’s really easy. Just check this tutorial: https://colab.research.google.com/drive/1CNNB_HPhQ8g7piosdehqWlgA30xoLauP

It’s covered data preparation, training and saving trained model to the Hugging Face hub

Then you will be able to load your model as follows:

model = AutoModelForCausalLM(“WrapKey69/MySupaDupaMistral”)