overview for ntn8888

Wanna upgrade PC for LLMs in c/localllama@poweruser.forum

[–] ntn8888@alien.top 1 points 2 years ago

Your comparison proves his point! 13b will fit snuggly in your 6900 this is a head on comparison of the cards!

Wanna upgrade PC for LLMs in c/localllama@poweruser.forum

[–] ntn8888@alien.top 1 points 2 years ago

Welcome to the rabbit hole 😁. On a serious note, going for the newer generations pays dividends, in my opinion.

what is the best 7b right now ? in c/localllama@poweruser.forum

[–] ntn8888@alien.top 1 points 2 years ago

Oh god 🤦 But seriously we need a wiki with a leader board with votes😁

3

Cheap case for GPU Radeon Instinct MI50 (alien.top)

submitted 2 years ago by ntn8888@alien.top to c/homelab@selfhosted.forum

0 comments fedilink

Looking to UP my game in local AI inferencing.

I've come across these Aliexpress listings that are way cheap. Considering that these GPUs require external fans they're unsuitable for consumer desktops. I'm at a loss of ideas on which chassis is the best for me?

I'm new to the homelab game; and don't know a thing about blade servers(is that what they're called?).. I've only done homelab'in with refurbished SFF PCs previously.

PS: I know you can hack in a 3dprinted shroud/fan for use in a consumer case. But I'm looking to see if I can get a used server solution for cheaper or same price as building it on a new PC!

7B models keep repeating/glitching after certain number of tokens in c/localllama@poweruser.forum

[–] ntn8888@alien.top 1 points 2 years ago

I've noticed this extensively when running locally on my 8gb rx580. And the issue is pretty bad.. I've run exactly the models you stated.

But when I run on (big) cloud GPU on vast.ai (eg on rtx 3090 or A6000) the problem vanishes..

vast.ai is pretty cheap ($10 deposit)you can experiment on there and see.

What prompts/questions do you use to test a model’s capabilities? Ideally ones that aren’t included in their training data. in c/localllama@poweruser.forum

[–] ntn8888@alien.top 1 points 2 years ago

I've used gpt4 to help write articles for my blog. So I just picked some of the good articles that it wrote (eg Lutris game manager) and prompt the testing one to write (800 words) and then compare. This has worked really well for me. Vicuna 33b was the best alternative I've found in my small tests in creative writing.. Although I cant locally host it on my PC :/

New multilingual base model from nvidia: Nemotron-3-8B in c/localllama@poweruser.forum

[–] ntn8888@alien.top 1 points 2 years ago

an 8b model? surely releasing larger ones is good for their own game :/

1

Local hosted LLM sometimes gives unrelated responses. (alien.top)

submitted 2 years ago by ntn8888@alien.top to c/localllama@poweruser.forum

0 comments fedilink

I'm trying to run zephyr-7b, on my local machine with an RX580 8G using Text generation web UI. It works for the most part but sometimes gets into giving unrelated responses. After which I have to restart the app! Sometimes it even prints out right out gibberish..

I'm running zephyr-7b-beta.Q4\_K\_M.gguf\. With the following options:

n-gpu-layers: > 35
n_ctx: 8000

And parameters:

max_new_tokens: 2000
top_p: 0.95
top_k: 40
Instruction Template: ChatML

But if I run the above exact setup on a cloud GPU (vast.ai) it runs perfect.. What am I doing wrong?