this post was submitted on 22 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I maintain the uniteai project, and have implemented a custom backend for serving transformers-compatible LLMs. (That file's actually a great ultra-light-weight server if transformers satisfies your needs; one clean file).

I'd like to add GGML etc, and I haven't reached for cTransformers. Instead of building a bespoke server, it'd be nice if a standard was starting to emerge.

For instance, many models have custom instruct templates, which, if a backend handles all that for me, that'd be nice.

I've used llama.cpp, but I'm not aware of it handling instruct templates. Is that worth building on top of? It's not too llama-only focused? Production worthy? (it bills itself as "mainly for educational purposes").

I've considered oobabooga, but I would just like a best-in-class server, without all the other FE fixings and dependencies.

Is OpenAI's API signature something people are trying to build against as a standard?

Any recommendations?

top 3 comments
sorted by: hot top controversial new old
[–] noobgolang@alien.top 1 points 11 months ago (1 children)

Disclosure : I’m the maintainer of nitro project

We have a simple llama server with just single binary that you can download try right away here https://github.com/janhq/nitro it will be a viable option if you want to set up an openai compatible endpoint to test out new model

[–] KeyAdvanced1032@alien.top 1 points 11 months ago

I think all frameworks support custom instruct templates, and know for a fact llama.cpp does due to my use of StudioLM, based on llama.cpp, in which I can alter the system / user / assistant templates.