this post was submitted on 28 Nov 2023
1 points (100.0% liked)

LocalLLaMA

4 readers
4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago
MODERATORS
 

Hey all,

So I am trying to run some of the various models to try to learn more, but use for some specific research purposes. I have my trusty 16 core Threadripper (Gen 1) with 64GB ram, SSD and an AMD 6700XT GPU.

I installed Ubuntu server.. no GUI/desktop, to hopefully maximize hardware for AI stuff. It runs Docker on boot and it auto starts Portainer for me. I access that via web from another machine, and have deployed a couple of containers. I deployed the ollama container and the ollama-webui container.

Those work. I am able to load a model and run it. But they are insanely slow. My Windows machine with 8 core 5800 cpu and 32GB ram (but a 6900XT gpu) using LMStudio is able to load and respond much faster (though still kind of slow) with the same model.

I understand now after some responses/digging, that GPU is obviously much faster than CPU. I would have hoped a 16 core CPU with 64GB RAM would still offer some decent performance on the DeepSeek Coder 30b model, or the latest meta codellama model (30b). But they both take about 4+ minutes to start to respond to a simple "show me a hello world app in ..." and they take forever to output too.. like 2 or 3 characters per second.

So first, I would have thought it would run much faster on a 16 core machine with 64GB ram. But also.. is it not using my 6700XT GPU with 12GB VRAM? Is there some way I need to configure docker for ollama container to give it more RAM, cpus and access to GPU?

OR is there a better option to run on ubuntu server that mimics the OpenAI API so that webgui works with it? Or perhaps a better overall solution that would load/run models much faster utilizing the hardware?

Thank you.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here