this post was submitted on 25 Feb 2025
562 points (98.3% liked)
Technology
63277 readers
5494 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Couldn't you just treat the socketed ram like another layer of memory effectively meaning that L1-3 are on the CPU "L4" would be soldered RAM and then L5 would be extra socketed RAM? Alternatively couldn't you just treat it like really fast swap?
Wrote a longer reply to someone else, but briefly, yes, you are correct. Kinda.
Caches won’t help with bandwidth-bound compute (read: ”AI”) it the streamed dataset is significantly larger than the cache. A cache will only speed up repeated access to a limited set of data.
Could it work?
Yes, but it would require:
Right now, the easiest solution for fast, high-bandwidth RAM is just to solder all of it.
Using it as cache would reduce total capacity as cache implies coherence, and treating it as ordinary swap would mean copying to main memory before you access it which is silly when you can access it directly. That is you'd want to write a couple of lines of kernel code to use it effectively but it's nowhere close to rocket science. Nowhere near as complicated as making proper use of NUMA architectures.