this post was submitted on 30 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Check out turbo's project https://github.com/turboderp/exui
He just put it up not long ago and he has Speculative Decoding working on it. I tried it with Goliath 120b 4.85bpw exl2 and was getting 11-13 t/s vs 6-8 t/s without it. It's barebones but works.