LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

The Problem with LLMs for chat or roleplay (alien.top)

submitted 2 years ago by tammmu@alien.top to c/localllama@poweruser.forum

19 comments fedilink hide all child comments

I've been using self-hosted LLM models for roleplay purposes. But these are the worst problems I face every time, no matter what model and parameter preset I use.

I'm using :

Pygmalion 13B AWQ

Mistral 7B AWQ

SynthIA 13B AWQ [Favourite]

WizardLM 7B AWQ

It messes up with who's who. Often starts to behave like the user.
It writes in third person perspective or Narrative.
Sometimes, generates the exact same reply (exactly same to same text) back to back even though new inputs were given.
It starts to generate more of a dialogue or screenplay script instead of creating a normal conversation.

Anyone has any solutions for these?

you are viewing a single comment's thread
view the rest of the comments

[–] CocksuckerDynamo@alien.top 1 points 2 years ago

Anyone has any solutions for these?

Use a high quality model.

That means not 7B or 13B.

I know a lot of other people have already said this in the thread, but this keeps coming up in this sub so I'm just gonna say it too.

Bleeding edge 7B and 13B models look good in benchmarks. Try actually using them and the first thing you should realize is how poorly benchmark results indicate real world performance. These models are dumb.

You can get started on runpod by depositing as little as $10, that's less than some fast food meals, just take the plunge and find out for yourself. If you use an RTX A6000 48GB they'll only charge you $0.79 per hour so you get quite a few hours of experimenting to feel the difference for yourself. With 48GB VRAM you can run Q4_K_M quants of 70B with full GPU offloading, or try Q5_K_M or even Q6 or Q8 if you tweak the number of layers you're offloading to fit within 48GB (and still get fast enough generations for interactive chat.)

The difference is just absolutely night and day. Not only do 70Bs rarely make the basic mistakes you are describing, sometimes they even surprise me in a way that feels "clever."