I had a high hopes for Yi-34B chat, but when I tried it I saw it is not very good.
70B models are better (well of course), but I think even some 20B models are better.
I had a high hopes for Yi-34B chat, but when I tried it I saw it is not very good.
70B models are better (well of course), but I think even some 20B models are better.
I used oobabooga_windows\text-generation-webui
I think I tested it up to 500 tokens or so.
Running full Falcon-180B under budget constraint
Oh nonono, you doing it wrong ;) just kidding. Next numbers for reference of what one can have on a budget system without multiple hi end GPU-s.
i5-12400f + 128Gb DDR4 + some layers offloaded to 3060Ti = 0.35 token/second on Falcon-180B 4_K_M
Seems this model has a problem and not loading.
Tried gguf format of this model from huggingface and they just wont load.
Interesting, everyone suggesting 7B models, but you can run much better models using not only your GPU memory, so I would highly recommend mxlewd-l2-20b its very smart, its fantastic for writing scenes and such.
Uncensored model for story telling
No, somehow I got very different result.
It refuses to write smut ''I am AI created to write positive stories blah blah" (it's not literally what it said), and when I entered "Start reply with: Sure thing" it replied something like: "I'll try to write a story in a decent way." and then proceed to writing a story without a smut, like it was not a part of prompt.
Existing lzlv-70b is less censored in this regard and also writes a better stories, for my taste.
Well, it depends.
You can not run 70B models with RTX 3060, but you can with 64Gb of memory.
Is it possible to have something comparable to free ChatGPT
Many people say that local model are really close or even exceeds ChatGPT, but I personally dot see it although tried many different models. But you still can run something "comparable" with ChatGPT, it would be much much weaker though.
What is the hardware needed?
It works other way, you run a model that your hardware able to run. For example if you have 16Gb Ram than you can run 13B model. You even dont need GPU to run it, it just runs slower on CPU.
So to run your model locally you need to install software to run it locally, like this one:
https://github.com/oobabooga/text-generation-webui
And then you need a model, you can start with this one:
https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/tree/main you download one of the variants and put it in models folder of text-generation-webui installed on previous step.
Well, it is good for roleplay and writing. I tried only 2_K_M variant, because it has no bigger quants, yet.
Actually, 2_K_M already feel like best 70B models at 4_K_M quant, or even better.
If model fits completely inside 12Gb than it would work faster on a desktop, if model not fits into 12Gb but fits fully in 16Gb then you have a good chances it would run faster on a laptop with 16Gb GPU.