Found KoboldCpp on a guide somewhere and only used that. I barely even know much about anything else. I Just use GGUF and never worry about the so-called "VRAM"
LocalLLaMA
Community to discuss about Llama, the family of large language models created by Meta AI.
Text Generation UI as the backend and sillytavern as the front end.
KoboldCPP where proper transformers/cuda isn't supported.
Yep pretty good combo! I also use ooba+Silly and for internet query and pdf ingestion I use LolLLMs Great stuff!
Previously when I was more VRAM limited - koboldcpp. Now, I mainly use modified cli exllamav2 chat.py and oobabooga 50/50. Chat.py is about 8 token/s / 45% faster then oobabooga with the same model and exllamav2 loader for some reason, and I like having fast generation more than having nice UI. You forgot to mention SillyTavern, I think it gets a lot of use among coomers.
Bettergpt with llama.cpp server and its openai adapter, sleek, supports editing past messages without truncating the history, swapping roles at any time etc.
Text gen web ui. Let’s me use all model formats depending on what I want to test at that moment.
Text Gen Web UI. Works great on Mac. I use ggufs, since Llamacpp supports metal.
rocm version of KoboldCPP on my AMD+Linux
KoboldCPP. Double click Kobold Icon, Load, select preset, Launch. 10 or so second later you're good to go. Easy, quick, efficient.
Text Gen Web UI + Silly Tavern for me. Works like a charm.
I just switched to KoboldCpp from Text Geb UI 2 days ago.
The OpenAI extension wouldn't install for me and it was causing issues with SillyTavern which I use as a frontend.
I'm actually really happy now that I've switched.
KoboldCpp is so simple is great. I've written a simple batch file to launch both KoboldCpp and SillyTavern. All I have to do if I want to try a new model is edit the part of the batch pointing to the name of the model and it just works.
On top of that I can load more layers onto my GPU with KoboldCpp than Text Gen UI so I'm getting faster speeds.
Have you tried installing the missing package files it shows when u tried installing the openai extension?
I had the same issue and installing those missing packages via the cmd_windows in the same folder.
If you have coding skills,->https://github.com/mzbac/LLM_Web . Can deploy to local server or cloud
my own: https://github.com/knoopx/llm-workbench reasons: fast, private, lightweight, hackeable
You're kidding me. I recently surfaced my own UI with the same name. damn it. -> https://github.com/sedwards2009/llm-workbench
I used to use Text Generation Web UI, but I changed to KoboldCpp because it's more lightweight. Besides, I realized I didn't use all the features of the textgen UI. KoboldCpp as the backend and SillyTavern as the frontend when I want to chat. KoboldCpp alone when I want to play with models by creating stories or something.
LM Studio - very clean UI and easy to use with gguf.
Text generation web UI. The install script has worked perfectly every time I've run it, and the miniconda environment it creates is useful both within the web interface and for running LLM in python scripts. The interface also makes installing and using new models a breeze.
damn llama.cpp has a monopoly indirectly 😂
It's just easier to run (and deploy!) cross platform compiled code than to setup 10 different python envs and cross your fingers that it might work this time.
I use sillytavern along with text-generation-webui in api mode. Best setup for roleplay imo.
Llm studio is the most straightfoward for llama.cpp ui
Text Gen UI for general inference
llama.cpp server for multimodal
TavernAI, because it's simple and easy to use.
KoboldCpp for its ease, low memory, disk footprint and new context shift feature. Combining it with SillyTavern, it gives the best open source character.ai experience.
Don't forget exui: https://github.com/turboderp/exui
Once it implements notebook mode, I am probably going to switch to that, as all my reasons for staying on text gen ui (the better samplers, notebook mode) will be pretty much gone, and (as said below) text gen ui has some performance overhead.
Notebook mode is almost ready. Probably I'll release later today or early tomorrow.
BTW, one last thing on my wishlist (in addition to notebook mode) is prompt caching/scrolling.
I realized that the base exllamav2 backend in ooba (and not the HF hack) doesn't cache prompts, so prompt processing with 50K+ context takes well over a minute on my 3090. I don't know if that's also the case in exui, as I did not try a mega context prompt in my quick exui test.
I use synology chat
KoboldCPP + Silly Tavern. I would use the KoboldAI frontend instead of Silly Tavern, if it weren't for the fact that it is intended to create a dedicated system volume in order to work well. I personally find that creepy and unsettling, because I am uncomfortable with the technical aspects of computing. I can do intermediate stuff, but I still feel unhappy at the very idea of ever needing to troubleshoot.
Anyhow, I hope a commercial all-in-one LLM program is made, one meant for user privacy, roleplaying, approachable, open source, content editors, and an integrated marketplace for characters, rules, and other content. While the freeware efforts are neat, I am a boring person who wants things to Just Work, with only basic tinkering on my end.
At the moment, KoboldCPP + ST is probably the closest to being user-friendly without sacrificing privacy nor being subjected to a subscription.
ST. By far the most customizability.
KoboldCPP because easier to use and packed with a lot of useful features
I find running an OpenAI style API endpoint (using llama.cpp directly when I want fine control, or StudioLM when I need something quick and easy) is the best way to go in combination with a good chat UI designed to interface with OpenAI models.
To that end, I redirect Chatbox to my local LLM server, and I LOVE IT. Clean but powerful interface, support for markdown, ability to save different agents for quick recall, and more. Highly, HIGHLY recommend it.
It's open source and available on pretty much every platform -- and you can use it to interface with both local LLM and with OpenAI LLM's.
Using Text Gen but also wanted to try PrivateGPT .. a recent change in a dependent library put private on pause. Text Gen was OK.. but had issues with a bigger model. Ran out of memory or ran really slow .. got a 3090 with 24Gb and an i9-13900k with 64 gb ram. Any recommendations on a model and settings ?
Koboldcpp because that only one that work for me right now.
Own web UI for experimenting.
ollama + ollama web ui
Is just a great experience overall
No lovesr of Ollama with langchain?
Ollama is not cross platform (yet), so it's off the table for me. Looks neat, but I don't really see the point when there's already a bunch of cross platform solutions based on llama.cpp.
I do use it often. Using the endpoint, I can communicate with any UI
LM studio. Additional process of prepping for the link to be ready is a bit too much for me personally. It does concern me slightly that it's closed-source, but I just block its internet access lol
My own because if I didn't want to have control I would use ChatGPT and which I tried lack features I want: parameter randomization mid inference; generating several responses in sequence(not at once as kobold); having good editing experience(no undo tree = not for me); manual limiting of what tokens are being sent to the models(I don't want silent trimming when I have to guess the actual context)
I use various things, regularly testing if one of them has become better etc.
-
Mainly llama.cpp backend and server as UI - it has everything what I need, it’s lightweight, it’s hackable
-
Ollama - Simplifies many steps, has very convenient functions and an overall coherent and powerful ecosystem. Mostly in terminal, but sometimes in a modified Ollama Webui
-
Sometimes Agnai and/or RisuAI - nice and powerful UIs with satisfying UXs, however not as powerful as sillytavern. But sillytavern is too much if you are not a RP power-user.
-
My own custom Obsidian ChatGPT-MD + Canvas Chat Addon Addons with local endpoints.
In general I try to avoid everything that comes with python code and I prefer solutions with as minimal dependencies as possible, so it’s easier to hack and customize to my needs.
SillyTavern hooked up to koboldcpp-ROCM
Is anyone working on local UI interested in forming a community of builders? I think it would be great to share knowledge, learn from each other, and ultimately raise the bar for a better UI for everyone.
Anyone who wants to take the lead on this is more than welcome to. I'm just putting the idea out there.
A bit related. I think all the tools mentioned here are for using an existing UI.
But what if you wanted to easily roll your own, preferably in Python. I know of some options:
Gradio https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks
Panel https://www.anaconda.com/blog/how-to-build-your-own-panel-ai-chatbots
Reflex (formerly Pynecone) https://github.com/reflex-dev/reflex-chat https://news.ycombinator.com/item?id=35136827
Solara https://news.ycombinator.com/item?id=38196008 https://github.com/widgetti/wanderlust
I like streamlit (simple but not very versatile) And reflex seems to have a richer set of features.
My questions - Which of these do people like to use the most? Or are the tools mentioned by OP also good for rolling your own UI on top of your own software ?
Backend: 99% of the time, KoboldCPP, 1% of the time (testing EXL2 etc) Ooba
Front End: Silly Tavern
Why: GGUF is my preferred model type, even with a 3090. KoboldCPP is the best that I have seen at running this model type. SillyTavern should be obvious, but it is updated multiple times a day and is amazingly feature rich and modular.