tgredditfc

joined 1 year ago
[–] tgredditfc@alien.top 1 points 11 months ago

Maybe. I have not done it yet so I don’t know. You can google around.

[–] tgredditfc@alien.top 1 points 11 months ago (2 children)

You can use oobabooga API to do that. I haven’t done it myself, can’t say much about it.

[–] tgredditfc@alien.top 1 points 11 months ago (6 children)

You can start with reading Oobabooga’s wiki, I think it’s one of most beginner friendly tools. https://github.com/oobabooga/text-generation-webui/wiki/05-%E2%80%90-Training-Tab

 

I want to fine tune some LLM models with my own dataset which contains very long examples (a little > 2048 tokens). vRAM usage jumps up several GBs by just increasing the Cutoff Length from 512 to 1024.

Is there a way to feed those long examples into the models without increasing vRAM significantly?

[–] tgredditfc@alien.top 1 points 11 months ago

If I can run them all I will just pick the biggest one.

[–] tgredditfc@alien.top 1 points 11 months ago (2 children)

“Write the snake game using pygame”

[–] tgredditfc@alien.top 1 points 11 months ago

Thanks for sharing! I have been struggling with llama.cpp loader and GGUF (using oobabooga and the same LLM model), no matter how I set the parameters and how many offloaded layers to GPUs, llama.cpp is way slower to ExLlama (v1&2), not just a bit slower but 1 digit slower. I really don’t know why.

[–] tgredditfc@alien.top 1 points 11 months ago (2 children)

In my experience it’s the fastest and llama.cpp is the slowest.

[–] tgredditfc@alien.top 1 points 11 months ago

Thank you! It looks very deep to me, I will look into it.

[–] tgredditfc@alien.top 1 points 11 months ago

Thanks! I have some problems to load GPTQ models with transformer loader.

[–] tgredditfc@alien.top 1 points 11 months ago

Thanks for sharing!

 

Currently I have 12+24GB VRAM and I get Out Of Memory all the time when try to fine tune 33B models. 13B is fine, but the outcome is not very good so I would like to try 33B. I wonder if it’s worthy to replace my 12GB GPU with a 24GB one. Thanks!

[–] tgredditfc@alien.top 1 points 11 months ago (1 children)

I have 2 gpus and AWQ never works for me on Oobabooga, no matter how I split the vRAM, oom in most of the cases.

 

In terms of AI use, especially LLMs.

$5000 USD for the 128GB ram M3 MacBook Pro is still much cheaper than A100 80 GB.

view more: next ›