LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

New APU’s close to Gpu processing, but with unlimited memory? (alien.top)

submitted 2 years ago by bkm_s@alien.top to c/localllama@poweruser.forum

15 comments fedilink hide all child comments

If cpu processing is slow af, and gpu takes $$$$ to get enough memory for larger models; I am wondering if an APU could deliver some of that gpu speed, but using cheaper ram to get the larger models in memory; with 128gb of ram, that’s the equivalent of 6x 30/4090s, without allowing for overhead at least!

Wondering if anyone has got any current apu benchmarks vs cpu/gpu? Do you know if the GPU side of APU architecture can be used to get an increase over traditional CPU results?

I’ve been seeing a lot of claims that the ryzen 8000 series is going to be competing with low end Gpus, some people think all the way up to 3060.

If it’s possible to do, it might be the new best way to get large models working for cheap?

you are viewing a single comment's thread
view the rest of the comments

[–] he29@alien.top 1 points 2 years ago (1 children)

I'm not sure if 3D cache would help in this case, since there isn't a particular small part of the model that could be reused over and over: you have to read _all_ the weights when inferring the next word, right?

But I'm definitely looking forward to the 8000 series, since AM5 boards should get even cheaper by the time it comes out, and support for faster DDR5 should get better as well. And I really need to move on from my 10 years old Xeon haha..

[–] ccbadd@alien.top 1 points 2 years ago (1 children)

I didn't think so either about the 3d vcache until the article about getting 10X the performance from a ramdrive that came out a few days ago. If it works for ramdrives then surely we can figure a way to use that performance for inferencing.

[–] FlishFlashman@alien.top 1 points 2 years ago (1 children)

It's not going to help because the model data is much larger than the cache and the access pattern is basically long sequential reads.

[–] rarted_tarp@alien.top 1 points 2 years ago

It might help for LLMs since a lot of values are cached after each loop, but still highly unlikely to make a difference.