And feel free to ask! I'm just here to help you
DesperatePresence473
It depends on how deeply you want to immerse yourself. The library is intended for both rapid prototyping and production-ready development. I would recommend starting with the former, it's very simple and will take about 10-15 minutes to get started, not including training time.
Here is a notebook that allows you to train models on a single GPU:
https://colab.research.google.com/drive/1CNNB_HPhQ8g7piosdehqWlgA30xoLauP
You can download it and train your model locally on your computer.
I apologize, I've confused you. At first, I read RAM and thought that you wanted to train on the CPU.
Of course, 2 x 3090 would be more than enough for training. I believe even a 13B model with a large context length could be trained.
If you have 2 GPUs, I suggest training through the command line and utilizing DeepSpeed or FSDP (which has been tested less).
Here are examples of projects where it's explained in detail how you can train:
https://github.com/BobaZooba/xllm-demo
https://github.com/BobaZooba/wgpt
On Twitter, one person unknown to me posted about how they easily managed to train on multi-gpu (a super simple and short example):
I strongly recommend training on a GPU, as it speeds up the training process by an order of magnitude and has become the standard. I can recommend services that offer GPU rentals at the lowest prices.
Regarding the competitor libraries, I'm unlikely to be able to recommend anything specific. I created this particular library to simplify training on multi-GPU and prototyping, as well as to provide extensive customization options, including modifying the architecture, as is done in LoRA.
Thank you very much for your feedback on Tale Quest. It is very valuable to me, and I plan to further develop it someday. I would appreciate it if you continue to share your feedback. And I wanted to ask right away: is Telegram a popular app where you live? I am very concerned that Telegram might not be widespread enough for a full-fledged launch.
I strongly recommend training on a GPU, as it speeds up the training process by an order of magnitude and has become the standard. I can recommend services that offer GPU rentals at the lowest prices.
https://vast.ai
https://www.runpod.io
https://www.tensordock.com
Yeah, sure! That’s really easy. Just check this tutorial: https://colab.research.google.com/drive/1CNNB_HPhQ8g7piosdehqWlgA30xoLauP
It’s covered data preparation, training and saving trained model to the Hugging Face hub
Then you will be able to load your model as follows:
model = AutoModelForCausalLM(“WrapKey69/MySupaDupaMistral”)
Cheapest: https://www.tensordock.com/
Sometimes not too stable: https://vast.ai/
I’m using for production: https://www.runpod.io/