this post was submitted on 23 Nov 2023

1 points (100.0% liked)

LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

30,000 AI models (alien.top)

submitted 2 years ago by Creative_Bottle_3225@alien.top to c/localllama@poweruser.forum

19 comments fedilink hide all child comments

30,000 AI models

too many really. But from what I read in conversations and posts I notice one thing: you all try out Model all the time and that's fine, but I haven't yet read that anyone habitually uses one Model over others. It seems like you use one template for a few days and then start with a new one. Don't have your favorite? Which?

top 19 comments

sorted by: hot top controversial new old

[–] southpalito@alien.top 1 points 2 years ago (1 children)

It's an experimental playground where 99.99% of players are handicapped because they don't have access to the same volume of training data and hardware resources as the big corporate players. So you'll have hundreds of iterations of smaller models as people try many different things to narrow the massive gap with OpenAi solutions.

[–] LocoMod@alien.top 1 points 2 years ago (1 children)

What's stopping us from building a mesh of web crawlers and creating a distributed database that anyone can host and add to the total pool of indexers/servers? How long would it take to create a quality dataset by deploying bots that crawl their way "out" of the most popular and trusted sites for particular knowledge domains and just compress and dump that into a format for training into said global p2p mesh? If we got a couple of thousand nerds on Reddit to contribute compute and storage capacity to this network we might be able to build it relatively fast. Just sayin...

[–] ten0k@alien.top 1 points 2 years ago

I think that's the idea behind https://petals.dev/

[–] ThisGonBHard@alien.top 1 points 2 years ago

You dont talk about the "usuals".

My go to models were for a long time Stable Beluga 2 13B and 70B.

Then, 13B got replaced by Minstral, 70B by LZLV, and Airoboros Yi 34B came out that worked great for me.

As a rule: 7B - CPU inferencing on 2-4 cores while using GPU.

34B and 70B, GPU inferencing, models trade blows despite size diff, as they are different base models. (Llama vs Yi).

[–] _Lee_B_@alien.top 1 points 2 years ago

There are 30,000 on huggingface? Is that what you're saying?

I wonder how many of those are truly open source, with open data? I only know of the OpenLlama model, and the RedPajama dataset. There are a bunch of datasets on huggingface too, but I don't know if any of those are complete enough to train a major LLM on.

[–] herozorro@alien.top 1 points 2 years ago

cause the majority suck very bad compared to chatgpt

[–] dothack@alien.top 1 points 2 years ago (1 children)

OpenHermes-2.5-Mistral-7B is better than all the 13b and 7b model available.

[–] morphles@alien.top 1 points 2 years ago

What settings you use for it? In what UI? I tried it in silly tavern yesterday (via ooba backend), unmitigated disaster... tried bunch of setting nothing worked, as far as I go prompt template should be ChatML, but even with that...

[–] Only-Letterhead-3411@alien.top 1 points 2 years ago

I've been in this ride since the early GPTJ days. I've tried A LOT of models. Right now for general use models, preference is refined into only using ChatML format models.

[–] PSMF_Canuck@alien.top 1 points 2 years ago

There should be at least as many AI models as there are usefully-unique human minds.

[–] penguished@alien.top 1 points 2 years ago

I think the answer is... whatever you get to be stable and highly usable do cool things for your purposes.

It's a bit of an organic thing too because how you phrase your prompting unlocks different doors in different models every single day.

[–] testuser514@alien.top 1 points 2 years ago

To me, it seems like the localllama community needs some meta and ensemble llm projects.

I’m not sure if they exist but There should technically be trying see how to integrate large numbers of the 30000 models they exist now (maybe start from 2).

[–] msew@alien.top 1 points 2 years ago

KEKW

2 T at amazon. Why care about any of these others?

[–] samsekhar@alien.top 1 points 2 years ago

Introduce an interesting work: DARE (Drop And REscale)

DARE can merge multiple task-specific LLMs (e.g., WizardLM + WizardMath) into one LM with diverse abilities, ✅but without the need for retraining or GPUs

https://twitter.com/WizardLM_AI/status/1727672799391842468?t=alsj7WrhCzVzSxjN7vKOyQ&s=19

[–] CheatCodesOfLife@alien.top 1 points 2 years ago

WizardLM-70B for general gpt-like assistant.

WizardCoder or Codebooga for coding.

I use them daily, but I test models all the time too.

[–] Temporary-Size7310@alien.top 1 points 2 years ago

There is ton of fine tuned models and maybe 6-7 quantisized models per model and fine tuned models, open source, business usable, uncensored, for RAG, for photo description, for TTS, for CV, with updates of checkpoints and so on.

At the contrary fortunately there is people and enough diversity to adapt with hardware and objectives without pay fortunes to train, finetune models.

ie: If your needs are commercial, with a model speaking fluently spanish, small enough to inference fast for many clients and with censor, 100% on your local server, treating with confidential data there is almost no choice

[–] CRedIt2017@alien.top 1 points 2 years ago

These suggestions are for spicy RP only, for any other informational type chat I use bard

TheBloke_MLewd-ReMM-L2-Chat-20B-GPTQ it's good, is more forthcoming with perverse jargon, not as good when you're RPing about an interaction with 3 people (You, and 2 other females, for example)

TheBloke_Chronoboros-33B-GPTQ VERY good and handles 3 people like a charm. Will fight you now and then and has a tendency to either punish you for being too antisocial or if everyone is having a good time, it's all in no matter what. A bit more clinical in use of sexual jargon.

TheBloke_airoboros-33B-gpt4-1.4-GPTQ Seemingly the best when you want to really challenge your place in humanity and is almost as good with maintaining 2 other people's conversations/reactions as "Chrono" above.

Hopefully if you or someone is looking for hot RP, you'll find this helpful. Need 24 G Vram for the last two unless you use the trick to split the load between GPU and CPU (I haven't needed to do that with them myself).

[–] JoJoeyJoJo@alien.top 1 points 2 years ago

I mean we're in a period of really rapid development, there will be a hundred thousand models, maybe hundreds of thousands, but eventually we'll throw away the older ones and consolidate down to a few really refined ones that everyone will use.

Everyone knows iOS and Android, no one can tell you what version of Nokia's OS their eleventy billion featurephones were using.

[–] WaterPecker@alien.top 1 points 2 years ago

Instead of making all these models the effort would be way more valuable if focused on making things more efficient. Methods to execute models on lower spec machines. The barrier to entry is way to big for larger models, not everyone lives in places where a 4090 is remotely an option.

I feel it's just a lazy copout that relies on just throwing more power rather than careful optimized design like the video game industry today.