m18coppola

joined 10 months ago
[โ€“] m18coppola@alien.top 1 points 9 months ago

I have run 7B models with Q2_K on my raspberry pi with 4GB lol. It's kinda slow (still faster than I bargained for), but Q2_K models tend to be pretty stupid at the 7B size, no matter the speed. You can theoretically run a bigger model using swap-space (kind of like using your storage drive as ram), but then the token generation speeds come crawling to a halt.

[โ€“] m18coppola@alien.top 1 points 10 months ago

I had so much success with text embeddings and retrieval, I didn't end up needing to deploy an LLM at work. I do however have a secret Mistral-Trismegistus-7B@Q4_K hosted on a retired cheapo dell optiplex with a tarot card reader system prompt that I share with my teammates ๐Ÿ˜