Flashy_Squirrel4745

joined 1 year ago

Text Generation webui for general chatting, and vLLM for processing large amount of data using LLM.

On an RTX3090 vLLM is 10~20x faster than llama.cpp for 13b awq models.