this post was submitted on 01 Dec 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Let's say hypothetically that I'm a GPU poor and I'm a simpleton who has never gone beyond oobaboogaing and koboldcpping, and I want to run models larger than mistral at more than 2 tokens per second. Speculative decoding is my only option, right? What's the easiest way to do this? Do any UIs support it out of the box?

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here