this post was submitted on 25 Feb 2025
575 points (98.3% liked)

Technology

63277 readers
5694 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] enumerator4829@sh.itjust.works 1 points 5 hours ago (1 children)

Yeah, the cache hierarchy is behaving kinda wonky lately. Many AI workloads (and that’s what’s driving development lately) are constrained by bandwidth, and cache will only help you with a part of that. Cache will help with repeated access, not as much with streaming access to datasets much larger than the cache (i.e. many current AI models).

Intel already tried selling CPUs with both on-package HBM and slotted DDR-RAM. No one wanted it, as the performance gains of the expensive HBM evaporated completely as soon as you touched memory out-of-package. (Assuming workloads bound by memory bandwidth, which currently dominate the compute market)

To get good performance out of that, you may need to explicitly code the memory transfers to enable prefetch (preferably asynchronous) from the slower memory into the faster, á la classic GPU programming. YMMW.

[–] barsoap@lemm.ee 1 points 4 hours ago

I wasn't really thinking of HPC but my next gaming rig, TBH. The OS can move often accessed pages into faster RAM just as it can move busy threads to faster cores, gaining you some fps a second or two after alt-tabbing back to the game after messing around with firefox. If it wasn't for memory controllers generally driving channels all at the same speed that could already be a thing right now. It definitely already was a thing back in the days of swapping out to spinning platters.

Not sure about HBM in CPUs in general but with packaging advancement any in-package stuff is only going to become cheaper, HBM, pedestrian bandwidth, doesn't matter.