ntn8888

joined 1 year ago
[–] ntn8888@alien.top 1 points 11 months ago

Your comparison proves his point! 13b will fit snuggly in your 6900 this is a head on comparison of the cards!

[–] ntn8888@alien.top 1 points 11 months ago

Welcome to the rabbit hole 😁. On a serious note, going for the newer generations pays dividends, in my opinion.

[–] ntn8888@alien.top 1 points 11 months ago

Oh god 🤦 But seriously we need a wiki with a leader board with votes😁

 

Looking to UP my game in local AI inferencing.

I've come across these Aliexpress listings that are way cheap. Considering that these GPUs require external fans they're unsuitable for consumer desktops. I'm at a loss of ideas on which chassis is the best for me?

I'm new to the homelab game; and don't know a thing about blade servers(is that what they're called?).. I've only done homelab'in with refurbished SFF PCs previously.

PS: I know you can hack in a 3dprinted shroud/fan for use in a consumer case. But I'm looking to see if I can get a used server solution for cheaper or same price as building it on a new PC!

[–] ntn8888@alien.top 1 points 11 months ago

I've noticed this extensively when running locally on my 8gb rx580. And the issue is pretty bad.. I've run exactly the models you stated.

But when I run on (big) cloud GPU on vast.ai (eg on rtx 3090 or A6000) the problem vanishes..

vast.ai is pretty cheap ($10 deposit)you can experiment on there and see.

[–] ntn8888@alien.top 1 points 11 months ago

I've used gpt4 to help write articles for my blog. So I just picked some of the good articles that it wrote (eg Lutris game manager) and prompt the testing one to write (800 words) and then compare. This has worked really well for me. Vicuna 33b was the best alternative I've found in my small tests in creative writing.. Although I cant locally host it on my PC :/

[–] ntn8888@alien.top 1 points 11 months ago

an 8b model? surely releasing larger ones is good for their own game :/

 

I'm trying to run zephyr-7b, on my local machine with an RX580 8G using Text generation web UI. It works for the most part but sometimes gets into giving unrelated responses. After which I have to restart the app! Sometimes it even prints out right out gibberish..

I'm running zephyr-7b-beta.Q4\_K\_M.gguf\. With the following options:

n-gpu-layers: > 35
n_ctx: 8000

And parameters:

max_new_tokens: 2000
top_p: 0.95
top_k: 40
Instruction Template: ChatML

But if I run the above exact setup on a cloud GPU (vast.ai) it runs perfect.. What am I doing wrong?