Dead_Internet_Theory

joined 10 months ago
[–] Dead_Internet_Theory@alien.top 1 points 9 months ago

If you can run Q8_0 but use Q5_K_M for speed, any reason you don't just run an exl2 at 8bpw?

[–] Dead_Internet_Theory@alien.top 1 points 10 months ago (6 children)

That is absolutely impressive, but:

  1. is light quantization that bad? Couldn't you run 99% of the same model for half the cost? Is running unquantized just a flex/exercise/bragging right?
  2. Would quantized run faster? Slower? The same?
  3. Isn't Falcon-180B kinda... meh? I mean it's pretty smart from size alone, but the lack of fine tuning by the community means it's kind of like running LLaMA-70b by itself.
  4. Would one of those new crazy good Threadrippers beat the GPUs? lol
[–] Dead_Internet_Theory@alien.top 1 points 10 months ago

Unfortunately, they are already in positions of power, so the next best thing is to remove them from positions of power, undermine their efforts, and most importantly spread the word.

An ideal world is one where censorship freaks hold zero power and are universally ostracized.

[–] Dead_Internet_Theory@alien.top 1 points 10 months ago (1 children)

OP, this post is fantastic.

I wonder, is this a case of the community doing free R&D for OpenAI or they truly have a good reason for using naive sampling?

Also the graph comes from here, a bunch of other graphs there too.

[–] Dead_Internet_Theory@alien.top 1 points 10 months ago

The thing is, as far as I'm aware, "sound generation" is always a separate TTS thing cobbled together, and even "vision" is a separate thing that describes the image for the AI.

This 13b model is probably still state of the art in the vision department for open models, a few crop up now and again but they didn't surprise me much.
https://llava-vl.github.io/

If you need to recognize audio, check Whisper, or Faster-Whisper, or anything developed from that. If you need to generate voice, check Bark, maybe Silero, RVC, etc.

You probably won't find it all wrapped into one neat package like ChatGPT+ right now, but I'd love to be proven wrong.

[–] Dead_Internet_Theory@alien.top 1 points 10 months ago

That is awesome. What kind of platform do you use for that 3 GPUs setup?

[–] Dead_Internet_Theory@alien.top 1 points 10 months ago

Yeah, EXL2 is awesome. It's kinda black magic how GPUs that were released way before ChatGPT was a twinkle in anyone's eyes can run something that can trade blows with it. I still don't get how fractional bpw is even possible. What the hell, 2.55 bits man 😂 how does it even run after that to any degree? It's magic, that's what it is.

[–] Dead_Internet_Theory@alien.top 1 points 10 months ago

If the question is trivial, I trust GPT got it right. If the question is semi-complex, I ask, then confirm with a web search. For some reason Google used to be a lot smarter in the past, too. These days it's more of a link fetcher.

[–] Dead_Internet_Theory@alien.top 1 points 10 months ago

GPT-4 is programmed not to be racist nor sexist, as that is what white men do.

[–] Dead_Internet_Theory@alien.top 1 points 10 months ago

They seem to have 70b and Goliath (the 120b monstrosity) on there. Currently I only see one 70b on https://lite.koboldai.net/'s list, but the other day I saw a couple Goliaths. I have no idea why would anyone host the 120b other than maybe "crowd-sourcing" a dataset (probably against TOS or something, but why would anyone do it?).

 

After taking note of Goliath-120b, I suddenly got strangely curious about Horde. Surprisingly, searching for Horde doesn't show many posts, so hopefully someone can answer a few questions:

  1. What I understood is that I could host something like 13b or 20b, or SD/SDXL, which I can run just fine and fast, and rack up credits overnight for running 70b or 120b LLMs without queue and fast-ish at any moment later. Right?
  2. If so, how long do prompts on those big models take, more or less, when you have credits to skip the queue? Is it usable? (i.e., how many seconds would it show on SillyTavern?)
  3. Seeing as I only ever used Oobabooga and SillyTavern, I'm assuming Kobold is more or less a drop-in replacement for Oobabooga, just a backend to the model but everything translates well? If no, what can I expect to lose/get from Kobold as opposed to Ooba?
  4. Is there a "Horde for r*tards" guide somewhere?
  5. What do people get from hosting Goliath-120b for others? Don't get me wrong, I appreciate the deep pocket generosity, but is this like a data gathering operation from their point of view?

Thanks for reading this far. There's a good doggo being very comfy hidden in the following period.

[–] Dead_Internet_Theory@alien.top 1 points 10 months ago

Actually GBNF is this re-branding, BNF is the proper name (the G is Georgi Gerganov's). There's also a reason why languages compile to assembly but that doesn't mean it's user-friendly. Or Abstract Syntax Trees. There's stuff that pretty much only applies to compilers, doesn't mean it's a good general-purpose solution.

Though I must imagine implementing BNF is orders of magnitude easier than implementing the monster that is extended regular expressions.

 

GBNF, a rebranding of Backus-Naur Form is a kind of Regex if you somehow made Regex more obtuse and clunky and also way less powerful. It's like going to the dentist in text form. It is bad, and should feel bad.

HOWEVER, if you tame this vile beast of a language you can make AI respond to you in pretty much any way you like. And you should.

You can use it by pasting GBNF into SillyTavern, Oobabooga, or probably something else you might be using. First, click on the

settings thingie

then scroll down and paste it like so:

just pasting is enough.

In Ooba, you can go to

https://preview.redd.it/0j7nhuj23fxb1.png?width=521&format=png&auto=webp&s=82688cee191ddbbdc1bf5789e2dcb0e99693a7bf

And then

https://preview.redd.it/kcbur3s53fxb1.png?width=794&format=png&auto=webp&s=6b31c1a6c5f954bc2bbbe1488b0a71d164478de9

Note that not all loaders support it, I think it's limited to llama.cpp, transformers, and _HF variants.

Then, your next messages will be formatted like you wanted. In this case, every message will be "quoted text", *action text* or multiple instances. It should be simple to understand.

Here's that one in case you want it, I just wrote it and tested it:

root ::= (actions | quotes) (whitespace (actions | quotes))*

actions ::= "*" content "*"
quotes ::= "\"" content "\""

content ::= [^*"]+

whitespace ::= space | tab | newline
space ::= " "
tab ::= "\t"
newline ::= "\n"

Even if you don't know Regex this language should be easy to pick up, and will allow you to make LLMs always respond in a particular format (very useful in some cases!)

You can also look at the examples.

There are websites to test BNF like this one but since it's a badly designed, badly implemented language from hell, none of them will work and you will have to look at the console to find out why this ugly duckling of a language didn't want to work this time. Imagine if Batch files had regular expressions, it'd probably look like this. All of that said, this is pretty fucking useful! So thanks to whoever did the heavy lifting to implement this.

view more: next ›