this post was submitted on 26 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The broad outline:
* You would need an easy way for people to throw their GPU idle time at a cluster, and a reason to do so (i.e., what do your hosts get out of the deal?).
* You need an easy way to ingest datasets for training LoRAs.
* You'd need an automated pipeline to turn those fine-tuning datasets into aligned LoRAs, to be propagated to your inference nodes.
* You'd probably want to think about retrieval, and whether you would like that to be part of the story (and whether it puts you at additional legal risk).
* You'd need a fast inference server with S-LoRA (or whatever the leading method for batch inference with LoRAs is next week).
* You would need an HTTPS server on the front end that terminates TLS for all your endpoints, and routes API requests to the appropriate LoRA.
* You need a way to keep those certificates and inference server addresses up to date in spite of churn.
* You need to figure out your cost model, and revenue sharing model for your hosting providers if applicable, ideally one that doesn't involve a cryptocurrency unless you have a limitless legal budget and you are based in El Salvador and personal friends with the Bukele family.
From the generality of your question, your best bet would probably be to hire me ;-).
Thanks for your detailed reply, I don't think crowd sourcing GPUs is feasible or desired but the idea of only using different LORAs is interesting, can the LORAs be loaded separately from the models? Be able to load the model once and use two separate LORAs?
One base model, dozens maybe hundreds of adapters would be the goal.