this post was submitted on 25 Feb 2025
562 points (98.3% liked)

Technology

63277 readers
5494 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] fiddlesticks@lemmy.dbzer0.com 4 points 3 hours ago (3 children)

Couldn't you just treat the socketed ram like another layer of memory effectively meaning that L1-3 are on the CPU "L4" would be soldered RAM and then L5 would be extra socketed RAM? Alternatively couldn't you just treat it like really fast swap?

[–] enumerator4829@sh.itjust.works 3 points 2 hours ago

Wrote a longer reply to someone else, but briefly, yes, you are correct. Kinda.

Caches won’t help with bandwidth-bound compute (read: ”AI”) it the streamed dataset is significantly larger than the cache. A cache will only speed up repeated access to a limited set of data.

[–] balder1991@lemmy.world 1 points 2 hours ago* (last edited 2 hours ago)

Could it work?

Yes, but it would require:

  • A redesigned memory controller capable of tiering RAM (which would be more complex).
  • OS-level support for dynamically assigning memory usage based on speed (Operating systems and applications assume all RAM operates at the same speed).
  • Applications/libraries optimized to take advantage of this tiering.

Right now, the easiest solution for fast, high-bandwidth RAM is just to solder all of it.

[–] barsoap@lemm.ee 2 points 2 hours ago

Using it as cache would reduce total capacity as cache implies coherence, and treating it as ordinary swap would mean copying to main memory before you access it which is silly when you can access it directly. That is you'd want to write a couple of lines of kernel code to use it effectively but it's nowhere close to rocket science. Nowhere near as complicated as making proper use of NUMA architectures.