You have exllama, vllm, lmdeploy in python. And in most cases fastapi is used for serving an http endpoint.
I wrote llm-sharp just for dropping python (GIL, pip deps) and getting flexible adaptation to dynamic model structures apart from standard llama.
K024/llm-sharp: Language models in C# (github.com)
I've recently drafted this. But adding more models & features & tests & documentation will just cost too much time. Seeking for comments & colaborators.
You have exllama, vllm, lmdeploy in python. And in most cases fastapi is used for serving an http endpoint.
I wrote llm-sharp just for dropping python (GIL, pip deps) and getting flexible adaptation to dynamic model structures apart from standard llama.