this post was submitted on 09 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

Just wondering if anyone with more knowledge on server hardware could point me in the direction of getting an 8 channel ddr4 server up and running (Estimated bandwidth speed is around 200gb/s) So I would think it would be plenty for inferencing LLM's.
I would prefer to go used Server hardware due to price, when comparing the memory amount to getting a bunch of p40's the power consumption is drastically lower. Im just not sure how fast a slightly older server cpu can process inferencing.

If I was looking to run 80-120gb models would 200gb/s and dual 24 core cpu's get me 3-5 tokens a second?

you are viewing a single comment's thread
view the rest of the comments
[โ€“] Aaaaaaaaaeeeee@alien.top 1 points 1 year ago (1 children)

No way, you're that one guy I uploaded the f16 airoboros for ! I was hoping you'd get the model and I think you did it :)

[โ€“] FaustBargain@alien.top 1 points 1 year ago

sounds like me ;) Thanks!