Dry_Long3157

joined 10 months ago
[–] Dry_Long3157@alien.top 1 points 9 months ago

May I know the LoRA parameters, if you used q/LoRA?

 

Hey,

I'm trying to load TheBloke/deepseek-llm-67b-chat-GGUF 4_K_M with the llama.cpp loader and I keep running into this error. Please let me know how to fix. TIA.

https://preview.redd.it/4qymemytpb3c1.png?width=889&format=png&auto=webp&s=ee97cbe31b3625ef9b1bcc21ba74407a90651f8d

 

I was going through a paper called MILAN which is a pre-training method to teach the model good Visual representations and one thing that struck me is the large no. of epochs we used to train models on (see image) even if we want the model to be able to generalize well. So I'm curious to know why even base models are only trained with a low epoch count.

TIA.

https://preview.redd.it/un1mdjoodx2c1.png?width=1312&format=png&auto=webp&s=2f80e328b05c3aee00a32c1e1ee8289810d8ddf0

[–] Dry_Long3157@alien.top 1 points 9 months ago

Yup, it's the best I've tried for tables and math formulas.

[–] Dry_Long3157@alien.top 1 points 9 months ago (2 children)

nougat by Facebook is your best bet.

[–] Dry_Long3157@alien.top 1 points 10 months ago

Deepseek coder should be good, you can try the quantised 34B model.

[–] Dry_Long3157@alien.top 1 points 10 months ago

Hey, you could just download the config file and lora_train.py file and run it as I've explained in the readme!

To simplify it further, open both the file in any editor and load up the same environment you use for oobabooga. Then make all the changes based on your req in the lora_config.yaml file. Once you're done just run "python lora_train.py".

If you need further help, feel free to ask!

 

Hey everyone,

I came across a post recently where someone found it hard to find simple scripts to fine-tune LLMs with their data. So I put together a repo where you can just type out your requirements in a config.yaml file and the training happens flawlessly based on that.

Here's the repo - LLM-Trainer

It is still a wip so lemme know if guys want some other features added to this.

TIA.

 

Hey everyone,

I have a dataset that has around 8million pairs of prompts and responses collected and curated from a bunch of open-source datasets on hf. I wanted to know what's the best method to dedup this dataset. I am planning on doing this locally (4090 with 64gb ram) and I've looked into a few methods but I wasn't able to use those in my case cuz of my compute constraints.

Please let me know if y'all know a efficient method I can use!

TIA.

 

Hey, I am looking to fine-tune a LLM with the fim method but I am not able to find any repos online that I can use/follow. inCoder from meta also is trained in a similar way but I can't find the training code for that either.

I think think this method can be particularly useful when training models to "learn" a particular library or a codebase it hasn't seen before, or atleast that is my hypothesis.

Please let me know if you find any resources. TIA.