LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Proposal of LLM hosted in a co-funded host (alien.top)

submitted 11 months ago by DanIngenius@alien.top to c/localllama@poweruser.forum

5 comments fedilink hide all child comments

I saw an idea about getting a big LLM (30/44 Gb) running fast in a cloud server.

What if this server would be scalable in potency and the renting shared in a group of united users?

Some sort of DAO to get it started? Personally i would love to link advanced LMS's up to SD generation etc. And OpenAI is too sensitive for my liking. What do you think?

you are viewing a single comment's thread
view the rest of the comments

[–] georgejrjrjr@alien.top 1 points 11 months ago (1 children)

The broad outline:
* You would need an easy way for people to throw their GPU idle time at a cluster, and a reason to do so (i.e., what do your hosts get out of the deal?).

* You need an easy way to ingest datasets for training LoRAs.

* You'd need an automated pipeline to turn those fine-tuning datasets into aligned LoRAs, to be propagated to your inference nodes.

* You'd probably want to think about retrieval, and whether you would like that to be part of the story (and whether it puts you at additional legal risk).

* You'd need a fast inference server with S-LoRA (or whatever the leading method for batch inference with LoRAs is next week).

* You would need an HTTPS server on the front end that terminates TLS for all your endpoints, and routes API requests to the appropriate LoRA.

* You need a way to keep those certificates and inference server addresses up to date in spite of churn.

* You need to figure out your cost model, and revenue sharing model for your hosting providers if applicable, ideally one that doesn't involve a cryptocurrency unless you have a limitless legal budget and you are based in El Salvador and personal friends with the Bukele family.

From the generality of your question, your best bet would probably be to hire me ;-).

[–] DanIngenius@alien.top 1 points 11 months ago (1 children)

Thanks for your detailed reply, I don't think crowd sourcing GPUs is feasible or desired but the idea of only using different LORAs is interesting, can the LORAs be loaded separately from the models? Be able to load the model once and use two separate LORAs?

[–] georgejrjrjr@alien.top 1 points 11 months ago

One base model, dozens maybe hundreds of adapters would be the goal.