llama.cpp mostly, just on console with main.exe. Wrote a simple python file to talk to the llama.cpp server which also works great. LM Studio is good and i have it installed but i dont use it, i have a 8gb vram laptop gpu at office and 6gb vram laptop gpu at home so i make myself keep used to using the console to save memory where ever i can. My experience with text gen web ui has not been great, its takes far far too long to update, and sometimes it gets the torch installation right and sometimes torch is not installed with cuda. I really dont want to waste my time on that. I like to install everything manually and want some really light weight web ui to just use the server hosted with llama.cpp.

[–] Love_Cat2023@alien.top 1 points 2 years ago

Text generation web ui api with next js , it has more customise

[–] Flashy_Squirrel4745@alien.top 1 points 2 years ago

Text Generation webui for general chatting, and vLLM for processing large amount of data using LLM.

On an RTX3090 vLLM is 10~20x faster than llama.cpp for 13b awq models.

load more comments