Technology

74754 readers

3975 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

638

Framework’s first desktop is a strange—but unique—mini ITX gaming PC (arstechnica.com)

submitted 6 months ago by misk@sopuli.xyz to c/technology@lemmy.world

285 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] enumerator4829@sh.itjust.works 11 points 6 months ago (2 children)

All your RAM needs to be the same speed unless you want to open up a rabbit hole. All attempts at that thus far have kinda flopped. You can make very good use of such systems, but I’ve only seen it succeed with software specifically tailored for that use case (say databases or simulations).

The way I see it, RAM in the future will be on package and non-expandable. CXL might get some traction, but naah.

[–] fiddlesticks@lemmy.dbzer0.com 8 points 6 months ago (3 children)

Couldn't you just treat the socketed ram like another layer of memory effectively meaning that L1-3 are on the CPU "L4" would be soldered RAM and then L5 would be extra socketed RAM? Alternatively couldn't you just treat it like really fast swap?

[–] enumerator4829@sh.itjust.works 6 points 6 months ago

Wrote a longer reply to someone else, but briefly, yes, you are correct. Kinda.

Caches won’t help with bandwidth-bound compute (read: ”AI”) it the streamed dataset is significantly larger than the cache. A cache will only speed up repeated access to a limited set of data.

[–] balder1991@lemmy.world 3 points 6 months ago* (last edited 6 months ago)

Could it work?

Yes, but it would require:

A redesigned memory controller capable of tiering RAM (which would be more complex).
OS-level support for dynamically assigning memory usage based on speed (Operating systems and applications assume all RAM operates at the same speed).
Applications/libraries optimized to take advantage of this tiering.

Right now, the easiest solution for fast, high-bandwidth RAM is just to solder all of it.

[–] barsoap@lemm.ee 3 points 6 months ago

Using it as cache would reduce total capacity as cache implies coherence, and treating it as ordinary swap would mean copying to main memory before you access it which is silly when you can access it directly. That is you'd want to write a couple of lines of kernel code to use it effectively but it's nowhere close to rocket science. Nowhere near as complicated as making proper use of NUMA architectures.

[–] barsoap@lemm.ee 2 points 6 months ago* (last edited 6 months ago) (2 children)

The cache hierarchy has flopped? People aren't using swap?

NUMA also hasn't flopped, it's just that most systems aren't multi socket, or clusters. Different memory speeds connected to the same CPU is not ideal and you don't build a system like that but among upgraded systems that's not rare at all and software-wise worst thing that'll happen is you get the lower memory speed. Which you'd get anyway if you only had socketed RAM.

[–] enumerator4829@sh.itjust.works 2 points 6 months ago (1 children)

Yeah, the cache hierarchy is behaving kinda wonky lately. Many AI workloads (and that’s what’s driving development lately) are constrained by bandwidth, and cache will only help you with a part of that. Cache will help with repeated access, not as much with streaming access to datasets much larger than the cache (i.e. many current AI models).

Intel already tried selling CPUs with both on-package HBM and slotted DDR-RAM. No one wanted it, as the performance gains of the expensive HBM evaporated completely as soon as you touched memory out-of-package. (Assuming workloads bound by memory bandwidth, which currently dominate the compute market)

To get good performance out of that, you may need to explicitly code the memory transfers to enable prefetch (preferably asynchronous) from the slower memory into the faster, á la classic GPU programming. YMMW.

[–] barsoap@lemm.ee 1 points 6 months ago (1 children)

I wasn't really thinking of HPC but my next gaming rig, TBH. The OS can move often accessed pages into faster RAM just as it can move busy threads to faster cores, gaining you some fps a second or two after alt-tabbing back to the game after messing around with firefox. If it wasn't for memory controllers generally driving channels all at the same speed that could already be a thing right now. It definitely already was a thing back in the days of swapping out to spinning platters.

Not sure about HBM in CPUs in general but with packaging advancement any in-package stuff is only going to become cheaper, HBM, pedestrian bandwidth, doesn't matter.

[–] enumerator4829@sh.itjust.works 1 points 6 months ago

The thing is, consumers didn’t push Nvidias stock sky high, AI did. Microsoft isn’t pushing anything sane to consumers, Microsoft is pushing AI. AMD, Intel, Nvidia and Qualcomm are all pushing AI to consumers. Additionally, on the graphics side of things, AMD is pushing APUs to consumers. They are all pushing things that require higher memory bandwidth.

Consumer will get ”trickle down silicon”, like it or not. Out of package memory will die. Maybe not with you next gaming rig, but maybe the one after that.

[–] Jyek@sh.itjust.works 2 points 6 months ago (1 children)

In systems where memory speed are mismatched, the system runs at the slowest module's speed. So literally making the soldered, faster memory slower. Why even have soldered memory at that point?

[–] barsoap@lemm.ee 0 points 6 months ago* (last edited 6 months ago)

I'd assume the soldered memory to have a dedicated memory controller. There's also no hard requirement that a single controller can't drive different channels at different speeds. The only hard requirement is that one channel needs to run at one speed.

...and the whole thing becomes completely irrelevant when we're talking about PCIe expansion cards the memory controller doesn't care.