I maintain the uniteai project, and have implemented a custom backend for serving transformers
-compatible LLMs. (That file's actually a great ultra-light-weight server if transformers
satisfies your needs; one clean file).
I'd like to add GGML etc, and I haven't reached for cTransformers
. Instead of building a bespoke server, it'd be nice if a standard was starting to emerge.
For instance, many models have custom instruct templates
, which, if a backend handles all that for me, that'd be nice.
I've used llama.cpp
, but I'm not aware of it handling instruct templates
. Is that worth building on top of? It's not too llama-only focused? Production worthy? (it bills itself as "mainly for educational purposes").
I've considered oobabooga
, but I would just like a best-in-class server, without all the other FE fixings and dependencies.
Is OpenAI's API signature something people are trying to build against as a standard?
Any recommendations?