this post was submitted on 17 Nov 2023

1 points (100.0% liked)

LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

1

What UI do you use and why? (alien.top)

submitted 1 year ago by Deadlibor@alien.top to c/localllama@poweruser.forum

57 comments fedilink hide all child comments

From the wiki:

Text generation web UI

Text Generation Inference

you are viewing a single comment's thread
view the rest of the comments

[–] mcmoose1900@alien.top 1 points 1 year ago

I don't know of a model that fits in a 3090 and takes that much time to inference on

Yi-34B-200K is the base model I'm using. Specifically the Capybara/Tess tunes.

I can squeeze 63K context on it at 3.5bpw. Its actually surprisingly good at continuing a full context story, referencing details throughout and such.

Anyway I am on linux, so no gpu swap like windows. I am indeed using it in a chat/novel style chat, so the context does scroll and get cached in ooba.