this post was submitted on 14 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I've been using self-hosted LLM models for roleplay purposes. But these are the worst problems I face every time, no matter what model and parameter preset I use.

I'm using :

Pygmalion 13B AWQ

Mistral 7B AWQ

SynthIA 13B AWQ [Favourite]

WizardLM 7B AWQ

  1. It messes up with who's who. Often starts to behave like the user.

  2. It writes in third person perspective or Narrative.

  3. Sometimes, generates the exact same reply (exactly same to same text) back to back even though new inputs were given.

  4. It starts to generate more of a dialogue or screenplay script instead of creating a normal conversation.

Anyone has any solutions for these?

you are viewing a single comment's thread
view the rest of the comments
[–] Menix333@alien.top 1 points 1 year ago (1 children)

Since I started using 70B, I have never encountered these problems again. It is that much better.

[–] cleverestx@alien.top 1 points 1 year ago

I have a RTX 4090, 96GB of RAM and a i9-13900k CPU, and I still keep going back to 20b (4-6bpw) models due to the awful performance of 70b models, which 2.4bpw is supposed to fully fit the VRAM in.... even using Exllama2....

What is your trick to get better performance? If I don't use a small lame context of 2048, the speed of generating is actually un-usable (under 1 token/sec), what context are you using and what settings? Thank you.