m18coppola

joined 10 months ago

Is it possible to run Llama on a 4gb ram? in c/localllama@poweruser.forum

[–] m18coppola@alien.top 1 points 9 months ago

I have run 7B models with Q2_K on my raspberry pi with 4GB lol. It's kinda slow (still faster than I bargained for), but Q2_K models tend to be pretty stupid at the 7B size, no matter the speed. You can theoretically run a bigger model using swap-space (kind of like using your storage drive as ram), but then the token generation speeds come crawling to a halt.

permalink
fedilink
source

How are people here observing their experiments and production models? in c/localllama@poweruser.forum

[–] m18coppola@alien.top 1 points 10 months ago

I had so much success with text embeddings and retrieval, I didn't end up needing to deploy an LLM at work. I do however have a secret Mistral-Trismegistus-7B@Q4_K hosted on a retired cheapo dell optiplex with a tarot card reader system prompt that I share with my teammates 😁

permalink
fedilink
source