this post was submitted on 01 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I have mistral-7b-openorca.Q5_K_M.gguf currently running in proxmox debian container with 8 CPU cores, 8 gb ram using llama.cpp python. Speed is slightly slower than what we get on bing chat but its absolutely usable/fine for a personal, local assistant. I have coded using llama.cpp python binding and exposed chat UI to a local url using Gradio python lib. This has been very useful so far as an AI assistant for big/small random requests from phone, pc, laptops at home. I am also using this from outside using Cloudflare tunnels(in a separate network which i use to expose services).
I also have a similar setup using llama.cpp (compiled for amd gpu) on a sightly powerful linux system where I have created a linux script based invoking for a different model. I call this script using linux alias "summon-{modelname}" in shell and model is ready to serve directly from command line for my questions.