LocalLLaMA

1 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago

MODERATORS

communick@poweruser.forum

What's the simplest way to run speculative decoding? (alien.top)

submitted 9 months ago by Covid-Plannedemic_@alien.top to c/localllama@poweruser.forum

0 comments fedilink hide all child comments

Let's say hypothetically that I'm a GPU poor and I'm a simpleton who has never gone beyond oobaboogaing and koboldcpping, and I want to run models larger than mistral at more than 2 tokens per second. Speculative decoding is my only option, right? What's the easiest way to do this? Do any UIs support it out of the box?

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here