overview for ccbadd

Inferencing with AND X3D Processors in c/localllama@poweruser.forum

[–] ccbadd@alien.top 1 points 11 months ago

Maybe, but it's a lot faster than what we can do right now and its only the start.

Inferencing with AND X3D Processors in c/localllama@poweruser.forum

[–] ccbadd@alien.top 1 points 11 months ago

In the article they said that that is what was expected but the gains impacted the entire ramdrive and the concept has been proven now. The test used a 500mb+ block so bigger than the cache alone.

https://www.tomshardware.com/news/amd-3d-v-cache-ram-disk-182-gbs-12x-faster-pcie-5-ssd

New APU’s close to Gpu processing, but with unlimited memory? in c/localllama@poweruser.forum

[–] ccbadd@alien.top 1 points 11 months ago (2 children)

I didn't think so either about the 3d vcache until the article about getting 10X the performance from a ramdrive that came out a few days ago. If it works for ramdrives then surely we can figure a way to use that performance for inferencing.

New APU’s close to Gpu processing, but with unlimited memory? in c/localllama@poweruser.forum

[–] ccbadd@alien.top 1 points 11 months ago (4 children)

It AMD would put out an APU with 3D VCache and quad channel memory that lets you use all four slots at full speed (6000 mt/s or better) and not cripple it in the bios they could be kicking Apple tail.

1

Inferencing with AND X3D Processors (alien.top)

submitted 11 months ago by ccbadd@alien.top to c/localllama@poweruser.forum

6 comments fedilink

With the proof of concept done and users able to get over 180gb/s on a PC with AMD's 3d vcache, it sure would be nice if we could figure a way to use that bandwidth for CPU based inferencing. I think it only worked on Windows but if that is the case we should be able to come up with a way to do it under Linux too.

Why LocalLLaMa when GPT-4 exists? in c/localllama@poweruser.forum

[–] ccbadd@alien.top 1 points 11 months ago

For me it's just censorship and privacy. Maybe api costs once we get more apps will be an issue too.

[–] ccbadd@alien.top 1 points 11 months ago

I set one up for a while and it was a royal PITA! I have since switched to a managed email account using my own domain. So much less trouble. It's just not worth it in my opinion.

Running LLM on my personal pc in c/localllama@poweruser.forum

[–] ccbadd@alien.top 0 points 1 year ago (2 children)

I would replace the DDR5 ram rather than add to it or your memory will run a lot slower and you just don't need it if you're going to use gpus for inferencing. Also, a P40 is probably money better spent with this config than the P2200.

Something wrong with older AMD high VRAM cards? in c/localllama@poweruser.forum

[–] ccbadd@alien.top 1 points 1 year ago

I'd just be worried they will drop support for them in ROCm 6.0. They dropped the MI-50's already. Technically you can still run them and the other MI25 but ROCm is kernel specific so before long you might have to maintain a system with an old kernel to have it working. I have a pair of MI100s and while they do work fine, they are slower than NVidia 3090s when used with llama.cpp, exLLama, and Koboldcpp for some reason. It looks like with the new release of flashattention-2 the MI210 is the oldest card they support which I find very frustrating. I also have a couple W6800's and they are actually as fast or faster than the MI100s with the same software and about the same price and have built in cooling.

1

Who's working on Vulkan Multi GPU? (alien.top)

submitted 1 year ago by ccbadd@alien.top to c/localllama@poweruser.forum

0 comments fedilink

Looking at mlc-llm, vllm, nomic, etc. they all seem focused on inferencing with a vulkin backend and all have made statements about multi gpu support either on their roadmaps or being worked on over the past few months. Every time I see one say they added multi gpu support it turns out they just incorporated llama.cpp's CUDA and HIP support rather than implementing it on vulkan. Are there any projects that actually do multi gpu with vulkin and is there some technical reason it doesn't work? I only ask because vulkan is available on multiple platforms with default installs and would surely make things easier for end users.