ccbadd

joined 1 year ago
[–] ccbadd@alien.top 1 points 11 months ago

Maybe, but it's a lot faster than what we can do right now and its only the start.

[–] ccbadd@alien.top 1 points 11 months ago

In the article they said that that is what was expected but the gains impacted the entire ramdrive and the concept has been proven now. The test used a 500mb+ block so bigger than the cache alone.

https://www.tomshardware.com/news/amd-3d-v-cache-ram-disk-182-gbs-12x-faster-pcie-5-ssd

[–] ccbadd@alien.top 1 points 11 months ago (2 children)

I didn't think so either about the 3d vcache until the article about getting 10X the performance from a ramdrive that came out a few days ago. If it works for ramdrives then surely we can figure a way to use that performance for inferencing.

[–] ccbadd@alien.top 1 points 11 months ago (4 children)

It AMD would put out an APU with 3D VCache and quad channel memory that lets you use all four slots at full speed (6000 mt/s or better) and not cripple it in the bios they could be kicking Apple tail.

 

With the proof of concept done and users able to get over 180gb/s on a PC with AMD's 3d vcache, it sure would be nice if we could figure a way to use that bandwidth for CPU based inferencing. I think it only worked on Windows but if that is the case we should be able to come up with a way to do it under Linux too.

[–] ccbadd@alien.top 1 points 11 months ago

For me it's just censorship and privacy. Maybe api costs once we get more apps will be an issue too.

[–] ccbadd@alien.top 1 points 11 months ago

I set one up for a while and it was a royal PITA! I have since switched to a managed email account using my own domain. So much less trouble. It's just not worth it in my opinion.

[–] ccbadd@alien.top 0 points 11 months ago (2 children)

I would replace the DDR5 ram rather than add to it or your memory will run a lot slower and you just don't need it if you're going to use gpus for inferencing. Also, a P40 is probably money better spent with this config than the P2200.

[–] ccbadd@alien.top 1 points 11 months ago

I'd just be worried they will drop support for them in ROCm 6.0. They dropped the MI-50's already. Technically you can still run them and the other MI25 but ROCm is kernel specific so before long you might have to maintain a system with an old kernel to have it working. I have a pair of MI100s and while they do work fine, they are slower than NVidia 3090s when used with llama.cpp, exLLama, and Koboldcpp for some reason. It looks like with the new release of flashattention-2 the MI210 is the oldest card they support which I find very frustrating. I also have a couple W6800's and they are actually as fast or faster than the MI100s with the same software and about the same price and have built in cooling.

 

Looking at mlc-llm, vllm, nomic, etc. they all seem focused on inferencing with a vulkin backend and all have made statements about multi gpu support either on their roadmaps or being worked on over the past few months. Every time I see one say they added multi gpu support it turns out they just incorporated llama.cpp's CUDA and HIP support rather than implementing it on vulkan. Are there any projects that actually do multi gpu with vulkin and is there some technical reason it doesn't work? I only ask because vulkan is available on multiple platforms with default installs and would surely make things easier for end users.