I say:
- It has a performance hit, but it remains to be seen if going with a much larger model can compensate for that.
- The model needs to be trained from scratch, you cannot finetune an existing model for this apparently...
I say:
I mean, you can jailbreak/browbeat chatgpt/Claude into going against guardrails relatively easily, I smash "X" for doubt that Grok is going to be any different. If it will, now THAT is going to huge, if not in a way we'd like to I guess...
That explains why Goliath worked and yours - not so much, I guess...
"Prompt Template: Alpeca" Wut?
Looks like a scam to be fair. I bet if you apply, you'll get "Just send us 100$ for access!"
Did you do post-merge retraining? Without at least some results are going to be poor...
Did you do post-merge training and how much?
10s/tok and couple kilowatts of power... ok, if it was as smart as Einstein and as unerring as an Oracle it might make sense, but you can use it for free at Petals at 3 tok/sec and it is most certainly not...
Technically, you can somewhat automate the testing process by creating a script that makes that model aswer a series of questions that are relevant to YOU and are unique (so cannot be gamed by training for benchmarks) and evaluate those yourself.
Make sure you experiment using different sampling methods and run several tests due to inherent randomness of output.
Please dear Tzeench, have someone leak gpt4 in general confusion, I MUST know if this is really 10 7b models in a trench coat :)
My name is Mensch. Uber Mensch.
He MUST become a CEO of Uber, too! :))))
EXTRERMINATE!