tntdeez

joined 1 year ago
[โ€“] tntdeez@alien.top 1 points 11 months ago

exl2 processes most things in FP16, which the 1080ti, being from the Pascal era, is veryyy slow at. GGUF/llama.cpp on the other hand is capable of using an FP32 pathway when required for the older cards, that's why it's quicker on those cards.

[โ€“] tntdeez@alien.top 1 points 1 year ago

I'd do the 4060 ti and add a 16gb p100 to the mix to avoid doing any cpu inference. Use exl2. Otherwise I'd go 3090. CPU is slowww