overview for Ravenpest

Chassis only has space for 1 GPU - Llama 2 70b possible on a budget? in c/localllama@poweruser.forum

[–] Ravenpest@alien.top 1 points 11 months ago

I got a 4090, 128 GB of RAM. 70b runs fine at quant 5 and takes about 280 seconds to generate a message (full reprocessing) and around 100 less on a normal message. So I'd say yo'd be fine with that.

Yi-34B Model(s) Repetition Issues in c/localllama@poweruser.forum

[–] Ravenpest@alien.top 1 points 11 months ago

High temp does more harm than good. I would suggest looking into what the other settings do before raising it, no matter the model

Yi-34B Model(s) Repetition Issues in c/localllama@poweruser.forum

[–] Ravenpest@alien.top 1 points 11 months ago (3 children)

No issues here, just a lot of confidence on certain tokens but overall very little repetition. I use Koboldcpp, Q5 K M. Dont abuse temp, the model seems to be exceedingly sensitive and the smallest imbalance breaks its flow. Try temp 0,9, rep pen 1.11, top k 0, min-p 0.1, typical 1, tfs 1.

Is it just me or is prompt engineering basically useless with smaller models? in c/localllama@poweruser.forum

[–] Ravenpest@alien.top 1 points 11 months ago

At present the simpler and narrower the scope of the instruction the better. They cannot understand complex tasks. Extrapolate how the model thinks in general, then focus on one thing in particular and arrange your prompt accordingly.

Local LLM sends my conversions to developers despite privacy claim. in c/localllama@poweruser.forum

[–] Ravenpest@alien.top 1 points 11 months ago

LLMs are not able to "claim" anything, they're just roleplaying nonsense. Relax.

Best model for situational awareness and a 4090? in c/localllama@poweruser.forum

[–] Ravenpest@alien.top 1 points 11 months ago

Speed+quality = Nous-Capybara 34b. Offload 13 layers to system and get a Q5_K_M. If you have enough system RAM and a decent CPU you wont even feel it. Just quality, Euryale 1.3 70b. It will be slow - up to 200 seconds for a single message at Q5_K_M - but it will deliver.

disappointed by trainers in c/localllama@poweruser.forum

[–] Ravenpest@alien.top 1 points 1 year ago

If you are old enough to recall Commodore programming I would suggest a better usage of your remaining time on this Earth. No really, this tech is not at the level you seem to be desiring yet.

The Problem with LLMs for chat or roleplay in c/localllama@poweruser.forum

[–] Ravenpest@alien.top 1 points 1 year ago (1 children)

7b is waay too dumb to be able to roleplay right now. 13b is the bare minimum for that specific task.