this post was submitted on 29 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

I’m using a100 pcie 80g. Cuda11.8 toolkit 525.x

But when i inference codellama 13b with oobabooga(web ui)

It just make 5tokens/s

It is so slow.

Is there any config or something else for a100???

you are viewing a single comment's thread
view the rest of the comments
[–] hudimudi@alien.top 1 points 9 months ago

Uhmmm where did you buy that a100? Was it a good deal? lol. Just kidding, you probably set sth up wrong or the drivers are messing up. Is the card working fine otherwise in benchmarks?