this post was submitted on 26 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

https://www.amazon.se/-/en/NVIDIA-Tesla-V100-16GB-Express/dp/B076P84525 price in my country: 81000SEK or 7758,17 USD

My current setup:
NVIDIA GeForce RTX 4050 Laptop GPU
cuda cores: 2560
memory data rate 16.00 Gbps

My laptop GPU works fine for most ML and DL tasks. I am currently finetuning a GPT-2 model with some data that I scraped. And it worked surprisingly well on my current setup. So it's not like I am complaining.

I do however own a stationary PC with some old GTX 980 GPU. And was thinking of replacing that with the V100.

So my question to this community is: For those of you who have bought your own super-duper-GPU. Was it worth it. And what was your experience and realizations when you started tinkering with it?

Note: Please refrain giving me snarky comments about using Cloud GPU's. I am not interested in that (And I am in fact already using one for another ML task that doesn't involve finetuning) . I am interested to hear about the some hardware hobbyists opinion on this matter.

top 22 comments
sorted by: hot top controversial new old
[–] Flying_Madlad@alien.top 1 points 9 months ago

If you want 16gb check out the A4000. They're usually not that expensive and better cores

[–] ambient_temp_xeno@alien.top 1 points 9 months ago (5 children)
[–] freecodeio@alien.top 1 points 9 months ago (1 children)

So basically either 4090 or H100

[–] holistic-engine@alien.top 1 points 9 months ago

Yeah, perhaps If I am crazy enough I could just buy 3 of those and call it a day

[–] FullOf_Bad_Ideas@alien.top 1 points 9 months ago (1 children)

I can't corraborate results for Pascal cards. They had very limited FP16 performance, usually 1:64 of FP32 performance. Switching over to rtx 3090 ti from gtx 1080 got me around 10-20x gains in qlora training, assuming keeping the exact same batch size and ctx length, changing only calculations from fp16 to bf16.

[–] ambient_temp_xeno@alien.top 1 points 9 months ago

I'm not sure where this chart is from, but I remember it was made before qlora even existed.

[–] az226@alien.top 1 points 9 months ago

A6000 being worse than 3090 doesn’t make any sense.

[–] Mescallan@alien.top 1 points 9 months ago

Man those h100s really are on another level. I shudder to think where are in 5 years.

[–] aikitoria@alien.top 1 points 9 months ago

Is there any such benchmark that includes both the 4090/A100 and a mac with M2 Ultra / M3 Max? I've searched quite a bit but didn't find anyone comparing them on similar setups, it seems very interesting due to the large unified memory.

[–] a_beautiful_rhind@alien.top 1 points 9 months ago (1 children)

I'd love a V100 but they go for stupid prices where 3090s and a whole host of other cards make more sense. I think even RTX 8000 is cheaper and has more ram/is newer.

[–] Mission_Revolution94@alien.top 1 points 9 months ago

ye im with ya on that multiple 3090's are the go unless your working massive models I think.

[–] nero10578@alien.top 1 points 9 months ago (1 children)

A V100 16GB is like $700 on ebay. RTX 3090 24GB can be had for a similar amount.

[–] alchemist1e9@alien.top 1 points 9 months ago

Exactly which has me wondering why 3090 24g isn’t mentioned more on this sub. Isn’t that actually the best option. multiple of those

[–] Wooden-Potential2226@alien.top 1 points 9 months ago

Don’t buy the v100 at amazon.se - that price is crazy high

[–] ThisGonBHard@alien.top 1 points 9 months ago (1 children)

Why the hell would you get a 2 gen old 16 GB GPU for 7.7K when you can get 3-4 4090s, each will rofl stomp it ANY use case, let alone running 3.

Get either an A6000 (Ampere 48GB card), A6000 ADA, 3 4090s and the a AMD TR system with it or something like that. It will still run laps around the V100 and be cheaper.

[–] caphohotain@alien.top 1 points 9 months ago

This. I was so confused when I saw op's post: why on earth to buy an old only 16gb vram card with the price of multiple larger vram and newer cards?

[–] fireteller@alien.top 1 points 9 months ago

I say first use services like Lambda when you need the extra processing power. Then only buy the hardware when it genuinely would be a savings to buy the hardware and train locally.

Also, consumer GPUs / memory bandwidth are quickly exceeded as you want to work on larger and larger models. If you buy early you may quickly find that it is inadequate for your needs.

[–] Fun_Tangerine_1086@alien.top 1 points 9 months ago
  • You want VRAM, like lots of folks have mentioned; there's some non-obvious things here - you can make smaller VRAM work w/ reduced batch size or non-AdamW optimizers, but you trade off both speed and quality to do so.

  • You can split training across multiple GPUs; I use 2x 3060 12gb, though a real 24gb card would be better.

  • I don't recommend a V100 - you'd miss out on the bfloat16 datatype.

[–] synn89@alien.top 1 points 9 months ago (1 children)

I dug into this a lot back when I was building 2 AI servers for home use, for both inference and training. Dual 4090's are the best you can get for speed at a reasonable price. But for the best "bang for your buck" you can't beat used 3090's. You can pick them up reliably for $750-800 each off of Ebay.

I went with dual 3090's using this build: https://pcpartpicker.com/list/V276JM

I also went with NVLink which was a waste of money. It doesn't really speed things up as the board can already do x8 PCI on dual cards.

But a single 3090 is a great card you can do a lot with. If that's too much money, go with a 3060 12gb card. The server oriented stuff is a waste for home use. Nvidia 30xx and 40xx series consumer cards will just blow them away in a home environment.

[–] holistic-engine@alien.top 1 points 9 months ago (1 children)
[–] synn89@alien.top 1 points 9 months ago

Be careful with your motherboard choices if you're running 2 video cards. Many boards are only really designed to support 1x video card at x8 or x16 PCI speeds.

[–] Ion_GPT@alien.top 1 points 9 months ago

No. V100 is not ampere architecture and for that price is simply not worth. 3090 is cheaper and has 24 gb