It's an experimental playground where 99.99% of players are handicapped because they don't have access to the same volume of training data and hardware resources as the big corporate players. So you'll have hundreds of iterations of smaller models as people try many different things to narrow the massive gap with OpenAi solutions.
LocalLLaMA
Community to discuss about Llama, the family of large language models created by Meta AI.
What's stopping us from building a mesh of web crawlers and creating a distributed database that anyone can host and add to the total pool of indexers/servers? How long would it take to create a quality dataset by deploying bots that crawl their way "out" of the most popular and trusted sites for particular knowledge domains and just compress and dump that into a format for training into said global p2p mesh? If we got a couple of thousand nerds on Reddit to contribute compute and storage capacity to this network we might be able to build it relatively fast. Just sayin...
You dont talk about the "usuals".
My go to models were for a long time Stable Beluga 2 13B and 70B.
Then, 13B got replaced by Minstral, 70B by LZLV, and Airoboros Yi 34B came out that worked great for me.
As a rule: 7B - CPU inferencing on 2-4 cores while using GPU.
34B and 70B, GPU inferencing, models trade blows despite size diff, as they are different base models. (Llama vs Yi).
There are 30,000 on huggingface? Is that what you're saying?
I wonder how many of those are truly open source, with open data? I only know of the OpenLlama model, and the RedPajama dataset. There are a bunch of datasets on huggingface too, but I don't know if any of those are complete enough to train a major LLM on.
cause the majority suck very bad compared to chatgpt
OpenHermes-2.5-Mistral-7B is better than all the 13b and 7b model available.
What settings you use for it? In what UI? I tried it in silly tavern yesterday (via ooba backend), unmitigated disaster... tried bunch of setting nothing worked, as far as I go prompt template should be ChatML, but even with that...
I've been in this ride since the early GPTJ days. I've tried A LOT of models. Right now for general use models, preference is refined into only using ChatML format models.
There should be at least as many AI models as there are usefully-unique human minds.
I think the answer is... whatever you get to be stable and highly usable do cool things for your purposes.
It's a bit of an organic thing too because how you phrase your prompting unlocks different doors in different models every single day.
To me, it seems like the localllama community needs some meta and ensemble llm projects.
I’m not sure if they exist but There should technically be trying see how to integrate large numbers of the 30000 models they exist now (maybe start from 2).
KEKW
2 T at amazon. Why care about any of these others?
Introduce an interesting work: DARE (Drop And REscale)
DARE can merge multiple task-specific LLMs (e.g., WizardLM + WizardMath) into one LM with diverse abilities, ✅but without the need for retraining or GPUs
https://twitter.com/WizardLM_AI/status/1727672799391842468?t=alsj7WrhCzVzSxjN7vKOyQ&s=19
WizardLM-70B for general gpt-like assistant.
WizardCoder or Codebooga for coding.
I use them daily, but I test models all the time too.
There is ton of fine tuned models and maybe 6-7 quantisized models per model and fine tuned models, open source, business usable, uncensored, for RAG, for photo description, for TTS, for CV, with updates of checkpoints and so on.
At the contrary fortunately there is people and enough diversity to adapt with hardware and objectives without pay fortunes to train, finetune models.
ie: If your needs are commercial, with a model speaking fluently spanish, small enough to inference fast for many clients and with censor, 100% on your local server, treating with confidential data there is almost no choice
These suggestions are for spicy RP only, for any other informational type chat I use bard
TheBloke_MLewd-ReMM-L2-Chat-20B-GPTQ it's good, is more forthcoming with perverse jargon, not as good when you're RPing about an interaction with 3 people (You, and 2 other females, for example)
TheBloke_Chronoboros-33B-GPTQ VERY good and handles 3 people like a charm. Will fight you now and then and has a tendency to either punish you for being too antisocial or if everyone is having a good time, it's all in no matter what. A bit more clinical in use of sexual jargon.
TheBloke_airoboros-33B-gpt4-1.4-GPTQ Seemingly the best when you want to really challenge your place in humanity and is almost as good with maintaining 2 other people's conversations/reactions as "Chrono" above.
Hopefully if you or someone is looking for hot RP, you'll find this helpful. Need 24 G Vram for the last two unless you use the trick to split the load between GPU and CPU (I haven't needed to do that with them myself).
I mean we're in a period of really rapid development, there will be a hundred thousand models, maybe hundreds of thousands, but eventually we'll throw away the older ones and consolidate down to a few really refined ones that everyone will use.
Everyone knows iOS and Android, no one can tell you what version of Nokia's OS their eleventy billion featurephones were using.
Instead of making all these models the effort would be way more valuable if focused on making things more efficient. Methods to execute models on lower spec machines. The barrier to entry is way to big for larger models, not everyone lives in places where a 4090 is remotely an option.
I feel it's just a lazy copout that relies on just throwing more power rather than careful optimized design like the video game industry today.