LocalLLaMA

1 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago

MODERATORS

communick@poweruser.forum

Looking for CPU Inference Hardware (8 Channel Ram Server Motherboards) (alien.top)

submitted 10 months ago by jasonmbrown@alien.top to c/localllama@poweruser.forum

10 comments fedilink hide all child comments

Just wondering if anyone with more knowledge on server hardware could point me in the direction of getting an 8 channel ddr4 server up and running (Estimated bandwidth speed is around 200gb/s) So I would think it would be plenty for inferencing LLM's.
I would prefer to go used Server hardware due to price, when comparing the memory amount to getting a bunch of p40's the power consumption is drastically lower. Im just not sure how fast a slightly older server cpu can process inferencing.

If I was looking to run 80-120gb models would 200gb/s and dual 24 core cpu's get me 3-5 tokens a second?

you are viewing a single comment's thread
view the rest of the comments

[–] FaustBargain@alien.top 1 points 10 months ago (3 children)

my setup

EPYC Milan-X 7473X 24-Core 2.8GHz 768MB L3

512GB of HMAA8GR7AJR4N-XN HYNIX 64GB (1X64GB) 2RX4 PC4-3200AA DDR4-3200MHz ECC RDIMMs

MZ32-AR0 Rev 3.0 motherboard

6x 20tb WD Red Pros on ZFS with zstd compression

SABRENT Gaming SSD Rocket 4 Plus-G with Heatsink 2TB PCIe Gen 4 NVMe M.2 2280

you can probably get away with a non-x without really an performance difference. it might make a difference in very tiny models, but that's not the point of getting such a beastly machine.

I got the Milan-X because I also use it for cad, and circuit board development, and gaming, and video editing so it's an all in one for me.

also my electric bill went from $40 a month to $228 a month, but some of that is because I haven't setup the suspend states yet and the machine isn't sleeping the way I want it to yet. I just haven't gotten around to it. i imagine it would cut the bill in half, and then maybe choosing the right fan manager and governors might save me another $30 a month.

I can run falcon 180b unquantized and still have tons of ram left over.

[–] Aaaaaaaaaeeeee@alien.top 1 points 10 months ago (1 children)

No way, you're that one guy I uploaded the f16 airoboros for ! I was hoping you'd get the model and I think you did it :)

[–] FaustBargain@alien.top 1 points 10 months ago

sounds like me ;) Thanks!

load more comments (1 replies)