Sorry I didn't reply directly here when I say this post but I wanted to write a larger guide which also covers this subject.
I cover how a "quality jailbreak" zero depth input can do as you requested in OP.
Sorry I didn't reply directly here when I say this post but I wanted to write a larger guide which also covers this subject.
I cover how a "quality jailbreak" zero depth input can do as you requested in OP.
Goliath makes spelling errors?
I've only used a handful of mistral 7B's due to constraints but I've never seen it make any spelling errors.
Is that a side effect of merging?
I use koboldccp.
You are probably right about it being a bug as at first I couldn't get the model to work at all (it crashed koboldccp when loading up) but it was just because I had a week old version of koboldccp. I needed to download the version that came out like 4 days ago (at that time) ha! Then it loaded up fine but with that already mentioned quirk. I guess it will get fixed in short time.
Yeah the future of local LLM's lies in the smaller models for sure!
I was very impressed by rocket 3B in the brief time I tested it. Unfortunately I've only ever used mistral based 7B's so I can't compare it to older 7B's but it wouldn't surprise me if the benchmark results are accurate and it is as good as the old 7B's.
I'm glad I tried it as now I know to keep an eye on 3B progress. Might not be too long before 3B's are performing at the level of current mistral 7Bs!
One weird thing though. It crashed for me when I attempted to load in all layers even though I had the VRAM space and when loading 32/35 layers it gave me the same inference speed as when I load 32/35 layers of a 7B.
I could only get pretty muddled responses from the model.
Despite seemingly having a simple prompt template I suspect I didn't enter all the data correctly into simpletavern as the outputs I was getting were similar to when I have a wrong template selected for a model.
Shrugs
If a model wants to be successful they should really pick a standard template (pref ChatML) and clearly state that's what they are using.
Looking forward to trying this when some GGUF's are available.
Can you offload layers with this like GGUF?
I don't have much VRAM / RAM so even when running a 7B I have to partially offload layers.
No test is perfect but none the less I think it's pretty interesting that Intels new 7B has just beat out ALL currently tested models on Ayumi's RP ranking.
I've not actually tested it yet myself but this has certainly given me the motivation to try it!
Has intel made a big entrance or is it a false alarm? Leave your feedback below.
I just switched to KoboldCpp from Text Geb UI 2 days ago.
The OpenAI extension wouldn't install for me and it was causing issues with SillyTavern which I use as a frontend.
I'm actually really happy now that I've switched.
KoboldCpp is so simple is great. I've written a simple batch file to launch both KoboldCpp and SillyTavern. All I have to do if I want to try a new model is edit the part of the batch pointing to the name of the model and it just works.
On top of that I can load more layers onto my GPU with KoboldCpp than Text Gen UI so I'm getting faster speeds.
I setup exactly as OP's example showed but with 1.20 Repetition Penalty. The output was.. quite bad, worse than I was getting before tampering with all the settings.
I changed Repetition Penalty Range to match my context (8192) and that improved the output but it was still pretty bad.
I tried Repetition Penalty of 1.0 and that was much better but it tended to repeat after a bit (A common Mistral problem).
I tried 1.1 Repetition Penalty and it was close but still a bit too dumb / random.
1.05 Repetition Penalty seems to be a nice sweet spot for me atm. I do think the output is now better than what I had previously.
Strange you don't see much diff with the Repetition Penalty setting. It massively alters my outputs (when setup like OP).
I'm using OpenChat 3.5 7B for reference.
Hi thanks a lot for this, I haven't seen a good guide to these settings until now.
As someone who always runs mistral 7B models I have two questions,
For a general default for all mistral models would you recommend a Repetition Penalty setting of 1.20?
I run Mistral models at 8192 context. What should I set the Repetition Penalty Range at?
Thanks again for the great info and of course for making Min P!
So do you think this approach is better then Dynatemp?
Or are you planning to put forward both modifications, leaving Dynatemp out of this Kobold build to better test just the noise modification?