this post was submitted on 24 Jun 2024
742 points (95.6% liked)
Technology
59295 readers
4099 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
There are real world performance benefits to ram being as close as possible to the CPU, so it's not entirely without merit. But that's what CAMM modules are for.
But do those benefits outweigh doubling or tripling the amount of RAM by simply inserting another stick that you can buy for dozens of dollars?
Yes, there are massive advantages. It’s basically what makes unified memory possible on modern Macs. Especially with all the interest in AI nowadays, you really don’t want a machine with a discrete GPU/VRAM, a discrete NPU, etc.
Take for example a modern high-end PC with an RTX 4090. Those only have 24GB VRAM and that VRAM is only accessible through the (relatively slow) PCIe bus. AI models can get really big, and 24GB can be too little for the bigger models. You can spec an M2 Ultra with 192GB RAM and almost all of it is accessible by the GPU directly. Even better, the GPU can access that without any need for copying data back and forth over the PCIe bus, so literally 0 overhead.
The advantages of this multiply when you have more dedicated silicon. For example: if you have an NPU, that can use the same memory pool and access the same shared data as the CPU and GPU with no overhead. The M series also have dedicated video encoder/decoder hardware, which again can access the unified memory with zero overhead.
For example: you could have an application that replaces the background on a video using AI. It takes a video, decompresses it using the video decoder , the decompressed video frames are immediately available to all other components. The GPU can then be used to pre-process the frames, the NPU can use the processed frames as input to some AI model and generate a new frame and the video encoder can immediately access that result and compress it into a new video file.
The overhead of just copying data for such an operation on a system with non-unified memory would be huge. That’s why I think that the AI revolution is going to be one of the driving factors in killing systems with non-unified memory architectures, at least for end-user devices.
That’s a fantastic explanation! Thank you!