this post was submitted on 17 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] cubestar362@alien.top 1 points 11 months ago

Found KoboldCpp on a guide somewhere and only used that. I barely even know much about anything else. I Just use GGUF and never worry about the so-called "VRAM"

[–] a_beautiful_rhind@alien.top 1 points 11 months ago (1 children)

Text Generation UI as the backend and sillytavern as the front end.

KoboldCPP where proper transformers/cuda isn't supported.

[–] iChrist@alien.top 1 points 11 months ago

Yep pretty good combo! I also use ooba+Silly and for internet query and pdf ingestion I use LolLLMs Great stuff!

[–] FullOf_Bad_Ideas@alien.top 1 points 11 months ago

Previously when I was more VRAM limited - koboldcpp. Now, I mainly use modified cli exllamav2 chat.py and oobabooga 50/50. Chat.py is about 8 token/s / 45% faster then oobabooga with the same model and exllamav2 loader for some reason, and I like having fast generation more than having nice UI. You forgot to mention SillyTavern, I think it gets a lot of use among coomers.

[–] LoSboccacc@alien.top 1 points 11 months ago

Bettergpt with llama.cpp server and its openai adapter, sleek, supports editing past messages without truncating the history, swapping roles at any time etc.

[–] BangkokPadang@alien.top 1 points 11 months ago

Text gen web ui. Let’s me use all model formats depending on what I want to test at that moment.

[–] SomeOddCodeGuy@alien.top 1 points 11 months ago

Text Gen Web UI. Works great on Mac. I use ggufs, since Llamacpp supports metal.

[–] Couler@alien.top 1 points 11 months ago

rocm version of KoboldCPP on my AMD+Linux

[–] sebo3d@alien.top 1 points 11 months ago

KoboldCPP. Double click Kobold Icon, Load, select preset, Launch. 10 or so second later you're good to go. Easy, quick, efficient.

[–] sophosympatheia@alien.top 1 points 11 months ago

Text Gen Web UI + Silly Tavern for me. Works like a charm.

[–] CardAnarchist@alien.top 1 points 11 months ago (1 children)

I just switched to KoboldCpp from Text Geb UI 2 days ago.

The OpenAI extension wouldn't install for me and it was causing issues with SillyTavern which I use as a frontend.

I'm actually really happy now that I've switched.

KoboldCpp is so simple is great. I've written a simple batch file to launch both KoboldCpp and SillyTavern. All I have to do if I want to try a new model is edit the part of the batch pointing to the name of the model and it just works.

On top of that I can load more layers onto my GPU with KoboldCpp than Text Gen UI so I'm getting faster speeds.

[–] IamFuckinTomato@alien.top 1 points 11 months ago

Have you tried installing the missing package files it shows when u tried installing the openai extension?
I had the same issue and installing those missing packages via the cmd_windows in the same folder.

[–] Tiny_Judge_2119@alien.top 1 points 11 months ago

If you have coding skills,->https://github.com/mzbac/LLM_Web . Can deploy to local server or cloud

[–] Unlucky-Message8866@alien.top 1 points 11 months ago (1 children)

my own: https://github.com/knoopx/llm-workbench reasons: fast, private, lightweight, hackeable

[–] sime@alien.top 1 points 11 months ago

You're kidding me. I recently surfaced my own UI with the same name. damn it. -> https://github.com/sedwards2009/llm-workbench

[–] OC2608@alien.top 1 points 11 months ago

I used to use Text Generation Web UI, but I changed to KoboldCpp because it's more lightweight. Besides, I realized I didn't use all the features of the textgen UI. KoboldCpp as the backend and SillyTavern as the frontend when I want to chat. KoboldCpp alone when I want to play with models by creating stories or something.

[–] TobyWonKenobi@alien.top 1 points 11 months ago

LM Studio - very clean UI and easy to use with gguf.

[–] Demortus@alien.top 1 points 11 months ago

Text generation web UI. The install script has worked perfectly every time I've run it, and the miniconda environment it creates is useful both within the web interface and for running LLM in python scripts. The interface also makes installing and using new models a breeze.

[–] LyPreto@alien.top 1 points 11 months ago (1 children)

damn llama.cpp has a monopoly indirectly 😂

[–] BrainSlugs83@alien.top 1 points 11 months ago

It's just easier to run (and deploy!) cross platform compiled code than to setup 10 different python envs and cross your fingers that it might work this time.

[–] nsfw_throwitaway69@alien.top 1 points 11 months ago

I use sillytavern along with text-generation-webui in api mode. Best setup for roleplay imo.

[–] nuno5645@alien.top 1 points 11 months ago

Llm studio is the most straightfoward for llama.cpp ui

[–] appledelhi@alien.top 1 points 11 months ago
[–] durden111111@alien.top 1 points 11 months ago

Text Gen UI for general inference

llama.cpp server for multimodal

[–] sumrix@alien.top 1 points 11 months ago

TavernAI, because it's simple and easy to use.

[–] Robot1me@alien.top 1 points 11 months ago

KoboldCpp for its ease, low memory, disk footprint and new context shift feature. Combining it with SillyTavern, it gives the best open source character.ai experience.

[–] mcmoose1900@alien.top 1 points 11 months ago (2 children)

Don't forget exui: https://github.com/turboderp/exui

Once it implements notebook mode, I am probably going to switch to that, as all my reasons for staying on text gen ui (the better samplers, notebook mode) will be pretty much gone, and (as said below) text gen ui has some performance overhead.

[–] ReturningTarzan@alien.top 1 points 11 months ago (1 children)

Notebook mode is almost ready. Probably I'll release later today or early tomorrow.

[–] mcmoose1900@alien.top 1 points 11 months ago (2 children)

BTW, one last thing on my wishlist (in addition to notebook mode) is prompt caching/scrolling.

I realized that the base exllamav2 backend in ooba (and not the HF hack) doesn't cache prompts, so prompt processing with 50K+ context takes well over a minute on my 3090. I don't know if that's also the case in exui, as I did not try a mega context prompt in my quick exui test.

load more comments (2 replies)
load more comments (1 replies)
[–] ProfessionalGuitar32@alien.top 1 points 11 months ago

I use synology chat

[–] Sabin_Stargem@alien.top 1 points 11 months ago

KoboldCPP + Silly Tavern. I would use the KoboldAI frontend instead of Silly Tavern, if it weren't for the fact that it is intended to create a dedicated system volume in order to work well. I personally find that creepy and unsettling, because I am uncomfortable with the technical aspects of computing. I can do intermediate stuff, but I still feel unhappy at the very idea of ever needing to troubleshoot.

Anyhow, I hope a commercial all-in-one LLM program is made, one meant for user privacy, roleplaying, approachable, open source, content editors, and an integrated marketplace for characters, rules, and other content. While the freeware efforts are neat, I am a boring person who wants things to Just Work, with only basic tinkering on my end.

At the moment, KoboldCPP + ST is probably the closest to being user-friendly without sacrificing privacy nor being subjected to a subscription.

[–] Monkey_1505@alien.top 1 points 11 months ago

ST. By far the most customizability.

[–] Only-Letterhead-3411@alien.top 1 points 11 months ago

KoboldCPP because easier to use and packed with a lot of useful features

[–] altoidsjedi@alien.top 1 points 11 months ago (1 children)

I find running an OpenAI style API endpoint (using llama.cpp directly when I want fine control, or StudioLM when I need something quick and easy) is the best way to go in combination with a good chat UI designed to interface with OpenAI models.

To that end, I redirect Chatbox to my local LLM server, and I LOVE IT. Clean but powerful interface, support for markdown, ability to save different agents for quick recall, and more. Highly, HIGHLY recommend it.

It's open source and available on pretty much every platform -- and you can use it to interface with both local LLM and with OpenAI LLM's.

load more comments (1 replies)
[–] wa-jonk@alien.top 1 points 11 months ago

Using Text Gen but also wanted to try PrivateGPT .. a recent change in a dependent library put private on pause. Text Gen was OK.. but had issues with a bigger model. Ran out of memory or ran really slow .. got a 3090 with 24Gb and an i9-13900k with 64 gb ram. Any recommendations on a model and settings ?

[–] Merchant_Lawrence@alien.top 1 points 11 months ago

Koboldcpp because that only one that work for me right now.

[–] shibe5@alien.top 1 points 11 months ago

Own web UI for experimenting.

[–] acquire_a_living@alien.top 1 points 11 months ago

ollama + ollama web ui

Is just a great experience overall

[–] ding0ding0ding0@alien.top 1 points 11 months ago (2 children)

No lovesr of Ollama with langchain?

[–] BrainSlugs83@alien.top 1 points 11 months ago

Ollama is not cross platform (yet), so it's off the table for me. Looks neat, but I don't really see the point when there's already a bunch of cross platform solutions based on llama.cpp.

[–] sanjay303@alien.top 1 points 11 months ago

I do use it often. Using the endpoint, I can communicate with any UI

[–] Ok_League2590@alien.top 1 points 11 months ago

LM studio. Additional process of prepping for the link to be ready is a bit too much for me personally. It does concern me slightly that it's closed-source, but I just block its internet access lol

[–] Maykey@alien.top 1 points 11 months ago

My own because if I didn't want to have control I would use ChatGPT and which I tried lack features I want: parameter randomization mid inference; generating several responses in sequence(not at once as kobold); having good editing experience(no undo tree = not for me); manual limiting of what tokens are being sent to the models(I don't want silent trimming when I have to guess the actual context)

[–] hackerllama@alien.top 1 points 11 months ago
[–] Evening_Ad6637@alien.top 1 points 11 months ago

I use various things, regularly testing if one of them has become better etc.

  • Mainly llama.cpp backend and server as UI - it has everything what I need, it’s lightweight, it’s hackable

  • Ollama - Simplifies many steps, has very convenient functions and an overall coherent and powerful ecosystem. Mostly in terminal, but sometimes in a modified Ollama Webui

  • Sometimes Agnai and/or RisuAI - nice and powerful UIs with satisfying UXs, however not as powerful as sillytavern. But sillytavern is too much if you are not a RP power-user.

  • My own custom Obsidian ChatGPT-MD + Canvas Chat Addon Addons with local endpoints.

In general I try to avoid everything that comes with python code and I prefer solutions with as minimal dependencies as possible, so it’s easier to hack and customize to my needs.

[–] benmaks@alien.top 1 points 11 months ago

SillyTavern hooked up to koboldcpp-ROCM

[–] itsuka_dev@alien.top 1 points 11 months ago

Is anyone working on local UI interested in forming a community of builders? I think it would be great to share knowledge, learn from each other, and ultimately raise the bar for a better UI for everyone.

Anyone who wants to take the lead on this is more than welcome to. I'm just putting the idea out there.

[–] SatoshiNotMe@alien.top 1 points 11 months ago

A bit related. I think all the tools mentioned here are for using an existing UI.

But what if you wanted to easily roll your own, preferably in Python. I know of some options:

StreamLit

Gradio https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks

Panel https://www.anaconda.com/blog/how-to-build-your-own-panel-ai-chatbots

Reflex (formerly Pynecone) https://github.com/reflex-dev/reflex-chat https://news.ycombinator.com/item?id=35136827

Solara https://news.ycombinator.com/item?id=38196008 https://github.com/widgetti/wanderlust

I like streamlit (simple but not very versatile) And reflex seems to have a richer set of features.

My questions - Which of these do people like to use the most? Or are the tools mentioned by OP also good for rolling your own UI on top of your own software ?

[–] USM-Valor@alien.top 1 points 11 months ago

Backend: 99% of the time, KoboldCPP, 1% of the time (testing EXL2 etc) Ooba

Front End: Silly Tavern

Why: GGUF is my preferred model type, even with a 3090. KoboldCPP is the best that I have seen at running this model type. SillyTavern should be obvious, but it is updated multiple times a day and is amazingly feature rich and modular.

[–] RYSKZ@alien.top 1 points 11 months ago
load more comments
view more: next ›