this post was submitted on 27 Oct 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

What are the benefits of using an H100 over an A100 (both at 80 GB and both using FP16) for LLM inference?

Seeing the datasheet for both GPUS, the H100 has twice the max flops, but they have almost the same memory bandwidth (2000 GB/sec). As memory latency dominates inference, I wonder what benefits the H100 has. One benefit could, of course, be the ability to use FP8 (which is extremely useful), but I'm interested in the difference in the hardware specs in this question.

top 14 comments
sorted by: hot top controversial new old
[–] 3DHydroPrints@alien.top 1 points 1 year ago (1 children)

H100 was additionally specialized to have higher performance for transformer models. I think it is about 8x faster than a A100 for transformers, but don't nail me down on it

[–] Gurrako@alien.top 1 points 1 year ago (2 children)

At first I thought that number was almost unbelievably high. It appears it can be 8x faster when using FlashAttention and a multi-GPU setup. Without multi-gpu and flash attention, it is a bit more than 2x faster.

Source: https://lambdalabs.com/blog/flashattention-2-lambda-cloud-h100-vs-a100

[–] 3DHydroPrints@alien.top 1 points 1 year ago

Thanks for clarifying:)

[–] TwistedBrother@alien.top 1 points 1 year ago

Sure but isn’t it the case that the H100 is what can sustain such a high throughput system whereas A100s are generally independent?

[–] norcalnatv@alien.top 1 points 1 year ago

There was quite a detailed technical blog published when H100 was announced with plenty of of comparison to A100.

[–] I_will_delete_myself@alien.top 1 points 1 year ago (1 children)

A100 is like a 3070ti with 80gb Vram. H100 is like a 4090 with 80gb of ram and optimized hardware for transformers.

[–] Annual-Minute-9391@alien.top 1 points 1 year ago (1 children)

Why a 3070ti? I would have guessed 3090? Something with clocks?

[–] I_will_delete_myself@alien.top 1 points 1 year ago (1 children)

They have around the same amount of cuda cores. Normally the bigger the cuda cores the higher the inference

[–] RobbinDeBank@alien.top 1 points 1 year ago

More tensor and cuda cores mean higher inference and training speed right? Do inference and training get the same benefit from those cores?

[–] redditfriendguy@alien.top 1 points 1 year ago

I don't think fp8 is a real thing

[–] Substantial-Job1405@alien.top 1 points 1 year ago

From my personal experience, I think h100 provides better performance when it comes to Low Level Machine Learning. The data processing speed is significantly faster compared to the a100, which can make a big difference when it comes to projects that take time to compete.

A100s and H100s are great for training, but a bit of a waste for inference.

[–] SnooHesitations8849@alien.top 1 points 1 year ago

H100 and A100 are best for training. H100 is optimized for lower precision (8/16 bits) and optimized for transformer. A100 is still very good but not that much. A100 is still very GPU-like. Wwhile H100 is a transformer-accelerator.

Using them for inference is not the best econ-friendly though.

[–] cyril1991@alien.top 1 points 1 year ago

The H100 is more recent and beefier. It is also more interesting to use it for the multi-instance GPU (MIG) feature where you “split it” for use on different workloads, so you could run multiple LLMs in parallel. The A100 has the same feature, but less memory/compute to split.