Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[Project] LLM inference with vLLM and AMD: Achieving LLM inference parity with Nvidia (alien.top)

submitted 2 years ago by openssp@alien.top to c/machinelearning@academy.garden

7 comments fedilink hide all child comments

I wanted to share some exciting news from the GPU world that could potentially change the game for LLM inference. AMD has been making significant strides in LLM inference, thanks to the porting of vLLM to ROCm 5.6. You can find the code implementation on GitHub.

The result? AMD's MI210 now almost matches Nvidia's A100 in LLM inference performance. This is a significant development, as it could make AMD a more viable option for LLM inference tasks, which traditionally have been dominated by Nvidia.

For those interested in the technical details, I recommend checking out this EmbeddedLLM Blog Post.

I'm curious to hear your thoughts on this. Anyone manage to run it on RX 7900 XTX?

https://preview.redd.it/rn7n29yxpuwb1.png?width=600&format=png&auto=webp&s=bdbac0d2b34d6f43a03503bbf72b446190248789

top 7 comments

sorted by: hot top controversial new old

[–] killver@alien.top 1 points 2 years ago

Is there some good cloud host for getting AMD GPUs?

[–] Booonishment@alien.top 1 points 2 years ago

Well now I know what I’m doing with my weekend. Thanks for sharing! Hopefully I can report back some xtx performance numbers.

[–] diamond_jackie07@alien.top 1 points 2 years ago (1 children)

I tried on this config - Ryzen 9 7950x MI210. I got this result Throughput: 129 requests/min, 1028.89 tokens/s on llama2-7b. Which is even better than the performance they cite on the post

[–] diamond_jackie07@alien.top 1 points 2 years ago

Will report back on 13b performance ASAP

[–] mrpoops@alien.top 1 points 2 years ago (1 children)

Will this run on a ryzen with Radeon built in?

If so, can’t you build a ryzen machine with like 128gb of ram and dedicate nearly all of it to video?

[–] ndreamer@alien.top 1 points 2 years ago

There are limits, some motherboards have higher limits https://www.tomshardware.com/news/dollar95-amd-cpu-becomes-16gb-gpu-to-run-ai-software

[–] TotesMessenger@alien.top 1 points 2 years ago

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

[/r/datascienceproject] LLM inference with vLLM and AMD: Achieving LLM inference parity with Nvidia (r/MachineLearning)

^(If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.) ^(Info ^/ ^Contact)