I'd do the 4060 ti and add a 16gb p100 to the mix to avoid doing any cpu inference. Use exl2. Otherwise I'd go 3090. CPU is slowww
tntdeez
joined 1 year ago
I'd do the 4060 ti and add a 16gb p100 to the mix to avoid doing any cpu inference. Use exl2. Otherwise I'd go 3090. CPU is slowww
exl2 processes most things in FP16, which the 1080ti, being from the Pascal era, is veryyy slow at. GGUF/llama.cpp on the other hand is capable of using an FP32 pathway when required for the older cards, that's why it's quicker on those cards.