GPUs/OS?
Dead_Internet_Theory
That is absolutely impressive, but:
- is light quantization that bad? Couldn't you run 99% of the same model for half the cost? Is running unquantized just a flex/exercise/bragging right?
- Would quantized run faster? Slower? The same?
- Isn't Falcon-180B kinda... meh? I mean it's pretty smart from size alone, but the lack of fine tuning by the community means it's kind of like running LLaMA-70b by itself.
- Would one of those new crazy good Threadrippers beat the GPUs? lol
Unfortunately, they are already in positions of power, so the next best thing is to remove them from positions of power, undermine their efforts, and most importantly spread the word.
An ideal world is one where censorship freaks hold zero power and are universally ostracized.
OP, this post is fantastic.
I wonder, is this a case of the community doing free R&D for OpenAI or they truly have a good reason for using naive sampling?
Also the graph comes from here, a bunch of other graphs there too.
The thing is, as far as I'm aware, "sound generation" is always a separate TTS thing cobbled together, and even "vision" is a separate thing that describes the image for the AI.
This 13b model is probably still state of the art in the vision department for open models, a few crop up now and again but they didn't surprise me much.
https://llava-vl.github.io/
If you need to recognize audio, check Whisper, or Faster-Whisper, or anything developed from that. If you need to generate voice, check Bark, maybe Silero, RVC, etc.
You probably won't find it all wrapped into one neat package like ChatGPT+ right now, but I'd love to be proven wrong.
That is awesome. What kind of platform do you use for that 3 GPUs setup?
Yeah, EXL2 is awesome. It's kinda black magic how GPUs that were released way before ChatGPT was a twinkle in anyone's eyes can run something that can trade blows with it. I still don't get how fractional bpw is even possible. What the hell, 2.55 bits man 😂 how does it even run after that to any degree? It's magic, that's what it is.
If the question is trivial, I trust GPT got it right. If the question is semi-complex, I ask, then confirm with a web search. For some reason Google used to be a lot smarter in the past, too. These days it's more of a link fetcher.
GPT-4 is programmed not to be racist nor sexist, as that is what white men do.
They seem to have 70b and Goliath (the 120b monstrosity) on there. Currently I only see one 70b on https://lite.koboldai.net/'s list, but the other day I saw a couple Goliaths. I have no idea why would anyone host the 120b other than maybe "crowd-sourcing" a dataset (probably against TOS or something, but why would anyone do it?).
Actually GBNF is this re-branding, BNF is the proper name (the G is Georgi Gerganov's). There's also a reason why languages compile to assembly but that doesn't mean it's user-friendly. Or Abstract Syntax Trees. There's stuff that pretty much only applies to compilers, doesn't mean it's a good general-purpose solution.
Though I must imagine implementing BNF is orders of magnitude easier than implementing the monster that is extended regular expressions.
If you can run Q8_0 but use Q5_K_M for speed, any reason you don't just run an exl2 at 8bpw?