this post was submitted on 29 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I’m using a100 pcie 80g. Cuda11.8 toolkit 525.x

But when i inference codellama 13b with oobabooga(web ui)

It just make 5tokens/s

It is so slow.

Is there any config or something else for a100???

you are viewing a single comment's thread
view the rest of the comments
[–] nuvalab@alien.top 1 points 11 months ago

That sounds like CPU speed. What you see from `watch nvidia-smi -d -n 0.1` while you're running inference ?