this post was submitted on 28 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

A couple of people have asked me to share my settings for solid roleplay on 7B. Yes it is possible. So here it goes. I'll try and make this brief and concise but full of every tweak I've learned so far.

So..

Step 1 - Backend

I'd recommend Koboldcpp generally but currently the best you can get is actually kindacognizant's Dynamic Temp mod of Koboldccp. It works exactly like main Koboldccp except when you change your temp to 2.0 it overrides the setting and runs in the test dynamic temp mode. It's actually got 2 other types of dynamic temp solutions built in there at different set temperature settings but just set it to 2 and forget imo, it seems to be the best of the 3. You can read about it here explained by kindacognizant himself. Suffice it to say it's excellent. In my experience it reduces (though not eliminates) repetition and looping because of increased word diversity and improves the ability of the model to respond to commands.

Even without the Dynamic temp test mod, Koboldcpp would still be my recommendation due to it's simplicity, fast run times, and lightweight nature. It's a single exe standalone file! This makes it SO easy to upgrade and manage it's fantastic. Better yet it's very simple to write a quick batch file to launch your GGUF of choice with optimal settings. I'll share an example batch file.

cd "C:\*****YOUR DIRECTORY PATH*****\SillyTavern\koboldcpp\"
start /min koboldcpp_dynamictemp_nov21.exe --model MODELOFCHOICEFILENAME.gguf --port 5001 --gpulayers 32 --highpriority --contextsize 8192 --usecublas
cd "C:\Users\Anon\Downloads\SillyTavern\"
start /min start.bat
exit    

Copy that into notepad saving it as a .bat file after editing. Change the directory path to where you keep your Koboldcpp exe. Change the MODELOFCHOICEFILENAME to your GGUF model name. If you have enough VRAM change the gpulayers to 35. If it crashes when loading lower the layers. If you aren't using an Nvidia GPU you'll need to change the usecublas bit too. You can find the arguments listed here. Your GGUF should be kept in the same folder along with the Koboldcpp exe. I like to make a folder in my SillyTavern install location for the sake of ease.

Basically inside my SillyTavern install folder a have a folder called "koboldcpp" and inside that sits the singular koboldcpp exe, a singular GGUF file and the above singular batch file. Running that batch starts both Koboldcpp and Sillytavern (launching with their command windows minimized). SillyTavern auto connects to Koboldcpp when setup as below. After this all you ever have to do is swap out the koboldcpp exe when a new version comes out or change the GGUF name in the batch file if you ever switch models. Super easy, no hassle. Great. You never even need to look at Koboldcpps GUI if you don't want to.

Step 2 - Front end

By consensus the best frontend for roleplay seems to be SillyTavern. I can attest to it being excellent with a breadth of options, addons and a sleek interface.

Once you've got it installed check out the top bar. Click the 2nd plug icon, select the KoboldAI API and hit the connect button when you have Koboldcpp running. It's as easy as that to connect! Check auto connect to last server and it will auto connect to koboldcpp when you next launch it. Job done.

Click the leftmost icon in the top bar. Here are the settings you need when you get it installed and connected to the Koboldcpp backend if you use a mistral based 7B. Note the sample order on the bottom is also changed and that temp is set to 2.0 assuming you are using the Dynamic temp test mod. If you aren't using that mod change temp to 1 but keep everything else the same. My typical P is disabled (thus blank), you do this by setting it to 1 and saving. Next time you open Sillytavern is will look blank like mine.

Lastly click the third "A" icon in the top bar. Select the Roleplay option in the top left dropdown and match all my settings here. Now click the drop down in the bottom center of my image. Along side "Instuct Mode sequences". You want to change it to match this. BE SURE TO CLICK THE SAVE BUTTON TO THE RIGHT OF INSTRUCT MODE PRESETS IN ROLEPLAY NEAR THE CENTRE OF THESE OPTIONS AS WELL AS THE SAVE BUTTON NEAR THE TOP LEFT CONTEXT TEMPLATE SETTINGS.

This is actually my personal contribution to improving the output of the model. It's a "quality jailbreak" I've written that formats output and gets the model to behave better. I've been told you can enter this type of jailbreak as a zero depth authors note but I've had no luck using that whereas editing this field works for me. Here is the text so you can copy paste. Be very careful to match the format of that text box exactly.

You're {{char}}.

Exclusively describe all your emotions, actions, perceptions, surroundings, thoughts, and sensations in full, elaborate, explicit, graphic, lewd, and vivid detail. Be verbose, expressive, engaging, natural, authentic, and creative. Write multiple fresh sentences, paragraphs, and phrases.

Write your internal monologue in round brackets. Write your speech in quotations. Write all other text in asterisks in third person.

To explain a bit more about this.. I discovered that the "system prompt" that people generally use to instruct their models only appears once at the top of the context window. Thus it doesn't have much strength and models don't really strictly follow instructions placed there. Editing the field I mentioned however places that text field content after every input making it very effective for controlling models output. There are drawbacks. Apparently it influences the model so strongly it can break the models ability to call instructions which can hamper addons. But I don't use or particular recommend any addons atm so imo for the niche of roleplay it's all upside.

Step 3 - The choice of model

Lastly the final step is selecting a model which responds well to the "quality jailbreak". Generally the better the model the better it's ability to follow the instructions I put in there.

Thinking along those lines I have tested a ton of popular 7B models.

Some viable options include,

 

openchat_3.5 - OpenChat / OpenOrca version of the quality jailbreak

openhermes-2.5-mistral-7b - ChatML version of the quality jailbreak

openhermes-2-mistral-7b (I actually found the dialogue to be a bit better with the older model, go figure) - ChatML version of the quality jailbreak

dolphin2.1-openorca-7b - ChatML version of the quality jailbreak

 

All of the above models performed fairly well to varying degrees. However from my tests I would recommend the following models for the best performance,

 

4th dolphin-2.1-mistral-7b - ChatML version of the quality jailbreak

Responds well to the instructions but I found it a bit bland.

3rd trion-m-7b - Alpaca / Roleplay version of the quality jailbreak

Solid, worth a try, quite similar to toppy.

2nd toppy-m-7b - Download Hermans AshhLimaRP SillyTavern templates, then edit it with the quality jailbreak

Hermans AshhLimaRP SillyTavern template seems to solve a brevity problem this model otherwise has when using the regular Alpaca / Roleplay version of the quality jailbreak. Very good output that you should certainly try. You might even prefer it to my number 1 choice.

 

1st

Misted-7B

Alpaca / Roleplay version of the quality jailbreak

A model I've never heard anyone talk about and wow. It's output is so good. It's flavorful and follows the quality prompt the best of any model I tested by a good margin.

I manually selected seeds 1-10. Here is it's first response in each case. Note in the 3 examples where its response is overly brief a simple continue resulted in very good output.

I would HIGHLY recommend you download and try this model even if you have no interest in my quality mod or even roleplay. I imagine the model is simply very good.

In conclusion

If you follow all the steps I've laid out here you will find that 7B's are indeed capable of quite enjoyable roleplay sessions. They aren't perfect and mistral still has issues in my experience when it goes a bit over 5kish context despite it's 8k claims but they are a lot better for roleplay than some people think and they are only going to get better.

I'm still learning and tweaking things as I go along. I'm still playing about with my quality jailbreak to see if I can get it working better. If anyone has any other good tips or corrections to anything I've said please feel free to chime in.

Oh and it goes without saying that the same field I use to input the quality jailbreak can be used for a lot of things. I saw someone ask how he could make his model respond less politely. It can certainly do that. I even made it finish all it's responses with "Nyaa" as a test. One thing to note if you want to try out commands. Use positive emphasis rather than negative. Don't for example tell it "Don't repeat or loop". Imagine you are speaking to a person who is hard of hearing; such a person might well miss the "don't" part and simply see a command saying "repeat or loop". That's why I wrote "Write multiple fresh sentences, paragraphs, and phrases." Don't ask the model "not to be polite" as it may simply latch on to "be polite". Instead say something like "Be direct and straightforward."

Anyway I've rambled on wayyy too much. Hope some people find this helpful.

you are viewing a single comment's thread
view the rest of the comments
[–] LosingID_583@alien.top 1 points 9 months ago (1 children)

Unless using some integration like stable diffusion or TTS, I would just use a prompt with the model itself. Not only is it much faster to generate responses, but it maintains better coherence because SillyTavern tends to fill up the context window with stuff it is wrapping around each response.

round brackets

I believe these are called parentheses.

[–] CardAnarchist@alien.top 1 points 9 months ago

Ah round brackets vs parentheses is one of those British vs American English things haha.

That said on paper parentheses probably should be the better choice as it should be less likely to be misinterpreted by the model.

I'm giving it a try with parentheses now, thanks!