this post was submitted on 31 Oct 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

Hi all, I need a help from all of you. I am going to buy H100s for training LLMs. Currently for fine-tuning 70b models but later we may consider pre-training larger models too. H100s looks more promising than A100s considering its power of FP8 support, so I asked quotes from multiple vendors. And then, realized there are too many options!

  1. DGX - 8x H100, much more expensive than other options but they say its performance is worth it.

  2. Buy PCI-E H100 cards and a Supermicro machine - from 2x upto 8x, looks cost effective.

2.a. some vendors offered a combination with NVLinks. Some says 1 link is needed for 2 and some says 3 links are needed for 2.

  1. H100 NVL - no idea what the difference is compared to the PCI-E with NVLinks but looks like they are newly introduced ones.

  2. Some other options, like a custom build made by the vendors.

Any BEST PRACTICE I can take a look to make a decision? Any advice from experts here who suffered a similar situation already? Thanks in advance πŸ™

you are viewing a single comment's thread
view the rest of the comments
[–] qrios@alien.top 1 points 1 year ago (3 children)

I realize this is totally unhelpful but, the DGX - 8x H100 costs just slightly more than the median price of a new house in the US . . .

I'm not saying this is a poor decision but . . . man that is one hell of a decision.

[–] HaywireVRV@alien.top 1 points 1 year ago

My friend’s company has a bunch of DGX idling for months. Ain’t that something.

[–] Herr_Drosselmeyer@alien.top 1 points 1 year ago (3 children)

OP isn't buying them for his personal setup, though that would be a baller move.

[–] OldPin8654@alien.top 1 points 1 year ago (1 children)

Yeah, it is not my money but still stressful

[–] Acceptable_Can5509@alien.top 1 points 1 year ago (2 children)

Wait, whos money is it? Can't you just rent as well?

[–] tvetus@alien.top 1 points 1 year ago

Can be hard to rent if all the capacity is bought out. But if it's just 1 DGX then they might be better off renting.

[–] Slimxshadyx@alien.top 1 points 1 year ago

I tried to rent from LambdaLabs yesterday but there was no availability for any gpu

[–] donotdrugs@alien.top 1 points 1 year ago

OP isn't buying them for his personal setup

Tbh I don't really see how this explains anything. Sure, OP doesn't go bankrupt buying it for the company but I'm 99% certain that it's still a bad financial decision.

[–] nero10578@alien.top 1 points 1 year ago

Definitely thought this was for his homelab

[–] FaustBargain@alien.top 1 points 1 year ago (1 children)

if it's a company that could be a drop in the bucket

[–] OldPin8654@alien.top 1 points 1 year ago

Yes! Put more money in it, the company!!!