zware

joined 10 months ago
[โ€“] zware@alien.top 1 points 10 months ago

If you want speed, you'll want to use Mistral-7B-OpenOrca-GPTQ with ExLLama v2, that'll give you around 40-45 tokens per second. TheBloke/Xwin-MLewd-13B-v0.2-GGUF to trade speed for quality (llama.cpp)

[โ€“] zware@alien.top 1 points 10 months ago

How do you get it to work with ExLlama or ExLlamav2?

It works beautifully with llama.cpp, but with GPTQ models the responses are always empty.

zephyr-7B-beta-GPTQ:gptq-4bit-32g-actorder_True:

{"emotion": "surprised", "affectionChange": 0, "location": "^", "feeling": "^", "action": [],"reply": "^"}

zephyr-7b-beta.Q4_K_M.gguf:

{"emotion":"surprised","affectionChange":0.5,"location":"office","feeling":"anxious","action":["looking around the environment"],"reply":"Hello! I'm Lilla, nice to meet you!"}

This is my grammar definition:

root ::= RoleplayCharacter
RoleplayCharacter ::= "{"   ws   "\"emotion\":"   ws   Emotion   ","   ws   "\"affectionChange\":"   ws   number   ","   ws   "\"location\":"   ws   string   ","   ws   "\"feeling\":"   ws   string   ","   ws   "\"action\":"   ws   stringlist   ","   ws   "\"reply\":"   ws   string   "}"
RoleplayCharacterlist ::= "[]" | "["   ws   RoleplayCharacter   (","   ws   RoleplayCharacter)*   "]"
Emotion ::= "\"happy\"" | "\"sad\"" | "\"angry\"" | "\"surprised\""
string ::= "\""   ([^"]*)   "\""
boolean ::= "true" | "false"
ws ::= [ \t\n]*
number ::= [0-9]+   "."?   [0-9]*
stringlist ::= "["   ws   "]" | "["   ws   string   (","   ws   string)*   ws   "]"
numberlist ::= "["   ws   "]" | "["   ws   string   (","   ws   number)*   ws   "]"

Do you need to "prime" the models using prompts to generate the proper output?