overview for Dry

The overthinker in c/localllama@poweruser.forum

[–] Dry_Long3157@alien.top 1 points 2 years ago

May I know the LoRA parameters, if you used q/LoRA?

1

Error while loading deepseek 67b gguf on ooba (alien.top)

submitted 2 years ago by Dry_Long3157@alien.top to c/localllama@poweruser.forum

1 comments fedilink

Hey,

I'm trying to load TheBloke/deepseek-llm-67b-chat-GGUF 4_K_M with the llama.cpp loader and I keep running into this error. Please let me know how to fix. TIA.

https://preview.redd.it/4qymemytpb3c1.png?width=889&format=png&auto=webp&s=ee97cbe31b3625ef9b1bcc21ba74407a90651f8d

1

Training LLMs on less epochs (alien.top)

submitted 2 years ago by Dry_Long3157@alien.top to c/localllama@poweruser.forum

0 comments fedilink

I was going through a paper called MILAN which is a pre-training method to teach the model good Visual representations and one thing that struck me is the large no. of epochs we used to train models on (see image) even if we want the model to be able to generalize well. So I'm curious to know why even base models are only trained with a low epoch count.

TIA.

https://preview.redd.it/un1mdjoodx2c1.png?width=1312&format=png&auto=webp&s=2f80e328b05c3aee00a32c1e1ee8289810d8ddf0

table extraction from pdf in c/localllama@poweruser.forum

[–] Dry_Long3157@alien.top 1 points 2 years ago

Yup, it's the best I've tried for tables and math formulas.

table extraction from pdf in c/localllama@poweruser.forum

[–] Dry_Long3157@alien.top 1 points 2 years ago (2 children)

nougat by Facebook is your best bet.

What is the best model for PowerShell? in c/localllama@poweruser.forum

[–] Dry_Long3157@alien.top 1 points 2 years ago

Deepseek coder should be good, you can try the quantised 34B model.

Simple trainer script! in c/localllama@poweruser.forum

[–] Dry_Long3157@alien.top 1 points 2 years ago

Hey, you could just download the config file and lora_train.py file and run it as I've explained in the readme!

To simplify it further, open both the file in any editor and load up the same environment you use for oobabooga. Then make all the changes based on your req in the lora_config.yaml file. Once you're done just run "python lora_train.py".

If you need further help, feel free to ask!

1

Simple trainer script! (alien.top)

submitted 2 years ago by Dry_Long3157@alien.top to c/localllama@poweruser.forum

3 comments fedilink

Hey everyone,

I came across a post recently where someone found it hard to find simple scripts to fine-tune LLMs with their data. So I put together a repo where you can just type out your requirements in a config.yaml file and the training happens flawlessly based on that.

Here's the repo - LLM-Trainer

It is still a wip so lemme know if guys want some other features added to this.

TIA.

1

Dataset de-duplication methods (alien.top)

submitted 2 years ago by Dry_Long3157@alien.top to c/localllama@poweruser.forum

1 comments fedilink

Hey everyone,

I have a dataset that has around 8million pairs of prompts and responses collected and curated from a bunch of open-source datasets on hf. I wanted to know what's the best method to dedup this dataset. I am planning on doing this locally (4090 with 64gb ram) and I've looked into a few methods but I wasn't able to use those in my case cuz of my compute constraints.

Please let me know if y'all know a efficient method I can use!

TIA.

1

Fill in middle fine-tuning LLM (alien.top)

submitted 2 years ago by Dry_Long3157@alien.top to c/localllama@poweruser.forum

0 comments fedilink

Hey, I am looking to fine-tune a LLM with the fim method but I am not able to find any repos online that I can use/follow. inCoder from meta also is trained in a similar way but I can't find the training code for that either.

I think think this method can be particularly useful when training models to "learn" a particular library or a codebase it hasn't seen before, or atleast that is my hypothesis.

Please let me know if you find any resources. TIA.