Flashy_Squirrel4745

joined 1 year ago

[–] Flashy_Squirrel4745@alien.top 1 points 1 year ago

Text Generation webui for general chatting, and vLLM for processing large amount of data using LLM.

On an RTX3090 vLLM is 10~20x faster than llama.cpp for 13b awq models.