LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

How is the processor core usage managed, and can we tweak it? (alien.top)

submitted 1 year ago by Actual-Bad5029@alien.top to c/localllama@poweruser.forum

2 comments fedilink hide all child comments

I have a Ryzen 5 with 12 cores. When I monitor the CPU during inference, many of them spike to 100%, but many do not. It looks like there is a lot more juice here than what is being squeezed out of the processor. Any gurus have any insight into how the models or underlying libraries decide how to allocate the CPU resources? I'd like all 10 cores to be at 100% the whole time with 2 to handle the minimal system requirements. You know?

top 2 comments

sorted by: hot top controversial new old

[–] Tacx79@alien.top 1 points 1 year ago

As owner of r7 1700 and r5 4600H I tested it and you don't get any speed benefits when using more than 5 threads, even if you use all 12+ cores, they will all spike to 100% but the speed will be the same as with 5 threads because memory bandwidth is the bottleneck here

[–] nmkd@alien.top 1 points 1 year ago

Which backend are you talking about