What UI do you use and why? : localllama

[–] cubestar362@alien.top 1 points 2 years ago

Found KoboldCpp on a guide somewhere and only used that. I barely even know much about anything else. I Just use GGUF and never worry about the so-called "VRAM"

[–] a_beautiful_rhind@alien.top 1 points 2 years ago (1 children)

Text Generation UI as the backend and sillytavern as the front end.

KoboldCPP where proper transformers/cuda isn't supported.

[–] iChrist@alien.top 1 points 2 years ago

Yep pretty good combo! I also use ooba+Silly and for internet query and pdf ingestion I use LolLLMs Great stuff!

[–] FullOf_Bad_Ideas@alien.top 1 points 2 years ago

Previously when I was more VRAM limited - koboldcpp. Now, I mainly use modified cli exllamav2 chat.py and oobabooga 50/50. Chat.py is about 8 token/s / 45% faster then oobabooga with the same model and exllamav2 loader for some reason, and I like having fast generation more than having nice UI. You forgot to mention SillyTavern, I think it gets a lot of use among coomers.

[–] LoSboccacc@alien.top 1 points 2 years ago

Bettergpt with llama.cpp server and its openai adapter, sleek, supports editing past messages without truncating the history, swapping roles at any time etc.

[–] BangkokPadang@alien.top 1 points 2 years ago

Text gen web ui. Let’s me use all model formats depending on what I want to test at that moment.

[–] SomeOddCodeGuy@alien.top 1 points 2 years ago

Text Gen Web UI. Works great on Mac. I use ggufs, since Llamacpp supports metal.

[–] Couler@alien.top 1 points 2 years ago

rocm version of KoboldCPP on my AMD+Linux

[–] sebo3d@alien.top 1 points 2 years ago

KoboldCPP. Double click Kobold Icon, Load, select preset, Launch. 10 or so second later you're good to go. Easy, quick, efficient.

[–] sophosympatheia@alien.top 1 points 2 years ago

Text Gen Web UI + Silly Tavern for me. Works like a charm.

[–] CardAnarchist@alien.top 1 points 2 years ago (1 children)

I just switched to KoboldCpp from Text Geb UI 2 days ago.

The OpenAI extension wouldn't install for me and it was causing issues with SillyTavern which I use as a frontend.

I'm actually really happy now that I've switched.

KoboldCpp is so simple is great. I've written a simple batch file to launch both KoboldCpp and SillyTavern. All I have to do if I want to try a new model is edit the part of the batch pointing to the name of the model and it just works.

On top of that I can load more layers onto my GPU with KoboldCpp than Text Gen UI so I'm getting faster speeds.

[–] IamFuckinTomato@alien.top 1 points 2 years ago

Have you tried installing the missing package files it shows when u tried installing the openai extension?
I had the same issue and installing those missing packages via the cmd_windows in the same folder.

[–] Tiny_Judge_2119@alien.top 1 points 2 years ago

If you have coding skills,->https://github.com/mzbac/LLM_Web . Can deploy to local server or cloud

[–] Unlucky-Message8866@alien.top 1 points 2 years ago (1 children)

my own: https://github.com/knoopx/llm-workbench reasons: fast, private, lightweight, hackeable

[–] sime@alien.top 1 points 2 years ago

You're kidding me. I recently surfaced my own UI with the same name. damn it. -> https://github.com/sedwards2009/llm-workbench

[–] OC2608@alien.top 1 points 2 years ago

I used to use Text Generation Web UI, but I changed to KoboldCpp because it's more lightweight. Besides, I realized I didn't use all the features of the textgen UI. KoboldCpp as the backend and SillyTavern as the frontend when I want to chat. KoboldCpp alone when I want to play with models by creating stories or something.

[–] TobyWonKenobi@alien.top 1 points 2 years ago

LM Studio - very clean UI and easy to use with gguf.

[–] Demortus@alien.top 1 points 2 years ago

Text generation web UI. The install script has worked perfectly every time I've run it, and the miniconda environment it creates is useful both within the web interface and for running LLM in python scripts. The interface also makes installing and using new models a breeze.

[–] LyPreto@alien.top 1 points 2 years ago (1 children)

damn llama.cpp has a monopoly indirectly 😂

[–] BrainSlugs83@alien.top 1 points 2 years ago

It's just easier to run (and deploy!) cross platform compiled code than to setup 10 different python envs and cross your fingers that it might work this time.

[–] nsfw_throwitaway69@alien.top 1 points 2 years ago

I use sillytavern along with text-generation-webui in api mode. Best setup for roleplay imo.

[–] nuno5645@alien.top 1 points 2 years ago

Llm studio is the most straightfoward for llama.cpp ui

[–] appledelhi@alien.top 1 points 2 years ago

I’m making my own frontend https://github.com/Julian-adv/IlluStory

[–] durden111111@alien.top 1 points 2 years ago

Text Gen UI for general inference

llama.cpp server for multimodal

[–] sumrix@alien.top 1 points 2 years ago

TavernAI, because it's simple and easy to use.

[–] Robot1me@alien.top 1 points 2 years ago

KoboldCpp for its ease, low memory, disk footprint and new context shift feature. Combining it with SillyTavern, it gives the best open source character.ai experience.

[–] mcmoose1900@alien.top 1 points 2 years ago (2 children)

Don't forget exui: https://github.com/turboderp/exui

Once it implements notebook mode, I am probably going to switch to that, as all my reasons for staying on text gen ui (the better samplers, notebook mode) will be pretty much gone, and (as said below) text gen ui has some performance overhead.

[–] ReturningTarzan@alien.top 1 points 2 years ago (1 children)

Notebook mode is almost ready. Probably I'll release later today or early tomorrow.

[–] mcmoose1900@alien.top 1 points 2 years ago (2 children)

BTW, one last thing on my wishlist (in addition to notebook mode) is prompt caching/scrolling.

I realized that the base exllamav2 backend in ooba (and not the HF hack) doesn't cache prompts, so prompt processing with 50K+ context takes well over a minute on my 3090. I don't know if that's also the case in exui, as I did not try a mega context prompt in my quick exui test.

load more comments (2 replies)

load more comments (1 replies)

[–] ProfessionalGuitar32@alien.top 1 points 2 years ago

I use synology chat

[–] Sabin_Stargem@alien.top 1 points 2 years ago

KoboldCPP + Silly Tavern. I would use the KoboldAI frontend instead of Silly Tavern, if it weren't for the fact that it is intended to create a dedicated system volume in order to work well. I personally find that creepy and unsettling, because I am uncomfortable with the technical aspects of computing. I can do intermediate stuff, but I still feel unhappy at the very idea of ever needing to troubleshoot.

Anyhow, I hope a commercial all-in-one LLM program is made, one meant for user privacy, roleplaying, approachable, open source, content editors, and an integrated marketplace for characters, rules, and other content. While the freeware efforts are neat, I am a boring person who wants things to Just Work, with only basic tinkering on my end.

At the moment, KoboldCPP + ST is probably the closest to being user-friendly without sacrificing privacy nor being subjected to a subscription.

[–] Monkey_1505@alien.top 1 points 2 years ago

ST. By far the most customizability.

[–] Only-Letterhead-3411@alien.top 1 points 2 years ago

KoboldCPP because easier to use and packed with a lot of useful features

[–] altoidsjedi@alien.top 1 points 2 years ago (1 children)

I find running an OpenAI style API endpoint (using llama.cpp directly when I want fine control, or StudioLM when I need something quick and easy) is the best way to go in combination with a good chat UI designed to interface with OpenAI models.

To that end, I redirect Chatbox to my local LLM server, and I LOVE IT. Clean but powerful interface, support for markdown, ability to save different agents for quick recall, and more. Highly, HIGHLY recommend it.

It's open source and available on pretty much every platform -- and you can use it to interface with both local LLM and with OpenAI LLM's.

load more comments (1 replies)

[–] wa-jonk@alien.top 1 points 2 years ago

Using Text Gen but also wanted to try PrivateGPT .. a recent change in a dependent library put private on pause. Text Gen was OK.. but had issues with a bigger model. Ran out of memory or ran really slow .. got a 3090 with 24Gb and an i9-13900k with 64 gb ram. Any recommendations on a model and settings ?

[–] Merchant_Lawrence@alien.top 1 points 2 years ago

Koboldcpp because that only one that work for me right now.

[–] shibe5@alien.top 1 points 2 years ago

Own web UI for experimenting.

[–] acquire_a_living@alien.top 1 points 2 years ago

ollama + ollama web ui

Is just a great experience overall

[–] ding0ding0ding0@alien.top 1 points 2 years ago (2 children)

No lovesr of Ollama with langchain?

[–] BrainSlugs83@alien.top 1 points 2 years ago

Ollama is not cross platform (yet), so it's off the table for me. Looks neat, but I don't really see the point when there's already a bunch of cross platform solutions based on llama.cpp.

[–] sanjay303@alien.top 1 points 2 years ago

I do use it often. Using the endpoint, I can communicate with any UI

[–] Ok_League2590@alien.top 1 points 2 years ago

LM studio. Additional process of prepping for the link to be ready is a bit too much for me personally. It does concern me slightly that it's closed-source, but I just block its internet access lol

[–] Maykey@alien.top 1 points 2 years ago

My own because if I didn't want to have control I would use ChatGPT and which I tried lack features I want: parameter randomization mid inference; generating several responses in sequence(not at once as kobold); having good editing experience(no undo tree = not for me); manual limiting of what tokens are being sent to the models(I don't want silent trimming when I have to guess the actual context)

[–] hackerllama@alien.top 1 points 2 years ago

https://github.com/huggingface/chat-ui/tree/main

[–] Evening_Ad6637@alien.top 1 points 2 years ago

I use various things, regularly testing if one of them has become better etc.

Mainly llama.cpp backend and server as UI - it has everything what I need, it’s lightweight, it’s hackable
Ollama - Simplifies many steps, has very convenient functions and an overall coherent and powerful ecosystem. Mostly in terminal, but sometimes in a modified Ollama Webui
Sometimes Agnai and/or RisuAI - nice and powerful UIs with satisfying UXs, however not as powerful as sillytavern. But sillytavern is too much if you are not a RP power-user.
My own custom Obsidian ChatGPT-MD + Canvas Chat Addon Addons with local endpoints.

In general I try to avoid everything that comes with python code and I prefer solutions with as minimal dependencies as possible, so it’s easier to hack and customize to my needs.

[–] benmaks@alien.top 1 points 2 years ago

SillyTavern hooked up to koboldcpp-ROCM

[–] itsuka_dev@alien.top 1 points 2 years ago

Is anyone working on local UI interested in forming a community of builders? I think it would be great to share knowledge, learn from each other, and ultimately raise the bar for a better UI for everyone.

Anyone who wants to take the lead on this is more than welcome to. I'm just putting the idea out there.

[–] SatoshiNotMe@alien.top 1 points 2 years ago

A bit related. I think all the tools mentioned here are for using an existing UI.

But what if you wanted to easily roll your own, preferably in Python. I know of some options:

StreamLit

Gradio https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks

Panel https://www.anaconda.com/blog/how-to-build-your-own-panel-ai-chatbots

Reflex (formerly Pynecone) https://github.com/reflex-dev/reflex-chat https://news.ycombinator.com/item?id=35136827

Solara https://news.ycombinator.com/item?id=38196008 https://github.com/widgetti/wanderlust

I like streamlit (simple but not very versatile) And reflex seems to have a richer set of features.

My questions - Which of these do people like to use the most? Or are the tools mentioned by OP also good for rolling your own UI on top of your own software ?

[–] USM-Valor@alien.top 1 points 2 years ago

Backend: 99% of the time, KoboldCPP, 1% of the time (testing EXL2 etc) Ooba

Front End: Silly Tavern

Why: GGUF is my preferred model type, even with a 3090. KoboldCPP is the best that I have seen at running this model type. SillyTavern should be obvious, but it is updated multiple times a day and is amazingly feature rich and modular.

[–] RYSKZ@alien.top 1 points 2 years ago

I like:
https://github.com/shinomakoi/magi_llm_gui

But it has recently migrated to:

https://github.com/shinomakoi/AI-Messenger

LocalLLaMA