this post was submitted on 09 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I have a Ryzen 5 with 12 cores. When I monitor the CPU during inference, many of them spike to 100%, but many do not. It looks like there is a lot more juice here than what is being squeezed out of the processor. Any gurus have any insight into how the models or underlying libraries decide how to allocate the CPU resources? I'd like all 10 cores to be at 100% the whole time with 2 to handle the minimal system requirements. You know?

top 2 comments
sorted by: hot top controversial new old
[โ€“] Tacx79@alien.top 1 points 1 year ago

As owner of r7 1700 and r5 4600H I tested it and you don't get any speed benefits when using more than 5 threads, even if you use all 12+ cores, they will all spike to 100% but the speed will be the same as with 5 threads because memory bandwidth is the bottleneck here

[โ€“] nmkd@alien.top 1 points 1 year ago

Which backend are you talking about