LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

TabbyAPI released! A pure LLM API for exllama v2. (github.com)

submitted 2 years ago by panchovix@alien.top to c/localllama@poweruser.forum

6 comments fedilink hide all child comments

top 6 comments

sorted by: hot top controversial new old

[–] a_beautiful_rhind@alien.top 1 points 2 years ago (1 children)

Nice. A lightweight loader. Will make us free of gradio.

[–] oobabooga4@alien.top 1 points 2 years ago (2 children)

Gradio is a 70MB requirement FYI. It has become common to see people calling text-generation-webui "bloated", when most of the installation size is in fact due to Pytorch and the CUDA runtime libraries.

https://preview.redd.it/pgfsdld7xw0c1.png?width=370&format=png&auto=webp&s=c50a14804350a1391d57d0feac8a32a5dcf36f68

[–] tronathan@alien.top 1 points 2 years ago

Gradio is a 70MB requirement

That doesn't make it fast, just small. Inefficient code can be compact.

[–] kpodkanowicz@alien.top 1 points 2 years ago

I think there is room for everyone - Text Gen is a piece of art - it's the only thing in the whole space that always works and is reliable. However, if im building an agent and getting a docker build, I can not afford to change text gen etc.

[–] panchovix@alien.top 1 points 2 years ago

By the hard work of kingbri, Splice86 and turboderp, we have a new API loader for LLMs using the exllamav2 loader! This is on a very alpha state, so if you want to test it may be subject to change and such.

TabbyAPI also works with SillyTavern! Doing some special configurations, it can work as well.

As a reminder, exllamav2 added mirostat, tfs and min-p recently, so if you used those on exllama_hf/exllamav2_hf on ooba, these loaders are not needed anymore.

Enjoy!

[–] Right-Structure-1619@alien.top 1 points 2 years ago

Does anyone know if they expose all the good stuff that Guidance uses for their guided generation and speedup? This plus guidance (kv cache, grammar control, etc) would be fast fast!