LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

LoRAX: Open Source Serving for 100s of Fine-Tuned LLMs in Production (alien.top)

submitted 1 year ago by Inevitable-Army-4274@alien.top to c/localllama@poweruser.forum

4 comments fedilink hide all child comments

Last month, we announced LoRAX (LoRA eXchange), a framework that makes it possible to serve hundreds of fine-tuned LLMs on one GPU with minimal degradation in throughput and latency. Today, we’re excited to release LoRAX to the open-source community under the permissive and commercial-friendly Apache 2.0 license. (original LoRAX blog).
What is LoRAX?
LoRAX works by loading in the fine-tuned “adapter” weights dynamically at runtime. Combining this with an optimized caching and scheduling policy that allows us to fuse multiple adapters into a single batch, LoRAX gives you the best of both worlds: low-cost serving with high performance. 💸 🏎️
Why open source?
At Predibase, we believe the future is smaller, faster, cheaper fine-tuned models. To get there, we as a community must work together to make serving fine-tuned models cost-competitive with the big commercial APIs.
As the core maintainers of Ludwig (https://ludwig.ai/) and Horovod (https://github.com/horovod/horovod), we’re no strangers to building communities around open-source AI. This isn’t a side project for us, it’s the foundation of our mission. 💪
Why join the LoRAX community?
🚢 Built for scale. LoRAX isn’t an academic project, it’s production infrastructure. Batteries included with pre-built Docker images, Helm charts for Kubernetes, metrics, and telemetry.
🤝 Research meets production. Bring together the best ideas from research into a single production framework (example: recently integrated SGMV kernel from Punica for significant performance improvements: https://arxiv.org/abs/2310.18547).
🕊️ Commercially viable, always. Whether you’re an individual developer or an AI platform like Predibase, you can build on LoRAX thanks to the permissive Apache 2.0 license.
Try LoRAX yourself today, and join the community to contribute and receive updates as we continue to invest in growing LoRAX in the weeks and months ahead.
Blog: https://predibase.com/blog/lorax-the-open-source-framework-for-serving-100s-of-fine-tuned-llms-in
GitHub: https://github.com/predibase/lorax

https://preview.redd.it/tscb64btqy0c1.png?width=1024&format=png&auto=webp&s=47e0e484bca5f3c957c639596216fd921b4ac266

you are viewing a single comment's thread
view the rest of the comments

[–] Independent_Key1940@alien.top 1 points 11 months ago

Wait, is this what gpt 4 is?? Because if you have noticed there's a noticeable delay when you submit input to gpt 4 compared to when you submit to 3.5