Text Generation webui for general chatting, and vLLM for processing large amount of data using LLM.
On an RTX3090 vLLM is 10~20x faster than llama.cpp for 13b awq models.
Text Generation webui for general chatting, and vLLM for processing large amount of data using LLM.
On an RTX3090 vLLM is 10~20x faster than llama.cpp for 13b awq models.