overview for Caffeine

Hardware Q's: Best model performance with 75+ 30 series GPU's? in c/localllama@poweruser.forum

[–] Caffeine_Monster@alien.top 1 points 11 months ago (2 children)

You could build an infiniband cluster. The 3090 would give you most bang for buck. Though it's a lot more work than trading out for A100s, and the extra hardware will cost. You can get 9 GPUs on an single epyc server mobo and still have good bandwidth. So we are talking about manually sourcing and building 10 boxes.

But unless you are training stuff and have cheap electricity a cluster probably doesn't make sense. No idea why you would need ~1800GB vram.

[D] The Status of Open Source Code LLMs in c/machinelearning@academy.garden

[–] Caffeine_Monster@alien.top 1 points 11 months ago

I haven't got round to trying the xwin coder models, but the precursor 70b chat model was extremely impressive when compared against both chat GPT 3.5 and 4.

Looking for CPU Inference Hardware (8 Channel Ram Server Motherboards) in c/localllama@poweruser.forum

[–] Caffeine_Monster@alien.top 1 points 1 year ago

Don't forget to include memory costs.

128GB+ of ecc ddr5 is not cheap

Comparing 4060 Ti 16GB + DDR5 6000 vs 3090 24GB: looking for 34B model benchmarks in c/localllama@poweruser.forum

[–] Caffeine_Monster@alien.top 1 points 1 year ago

DDR5 is also not fast enough to make much difference

The real issue is that the consumer cpus / motherboards have very few lanes. DDR5 is plenty fast, but you are probably maxing out motherboard bandwidth with two sticks.

Would not surprise me at all if server CPU inference is somewhere between x3 and x5 times faster.