LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Large-scale LLM deployment with GBNF support (alien.top)

submitted 11 months ago by MonkeyMaster64@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

Hey guys, as the title suggests I'd like some advice on the best way to serve LLMs with the support of GBNF or similar to ensure that I receive deterministic output. I have been using text-generation-web-ui locally and from there I can add my grammar, however, I would like to be able to do this across a cluster that can infer with high throughput. Any suggestions on how best to accomplish this?

A naive solution would be having multiple instances of text-generation-web-ui running in a cluster and distributing requests to each instance. My gut says there's a more ideal method that I can use.

top 1 comments

sorted by: hot top controversial new old

[–] mcmoose1900@alien.top 1 points 11 months ago

Llama.cpp's example server supports batching and custom grammar.

Its a work in progress for Aphrodite: https://github.com/PygmalionAI/aphrodite-engine/issues/36#issuecomment-1747429134