this post was submitted on 31 May 2024
58 points (96.8% liked)

Technology

59472 readers
3532 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] j4k3@lemmy.world 13 points 5 months ago

The only real choke point for present CPU's is the on chip cache bus width. Increase the size of all three, L1-L3, and add a few instructions to load some bigger words across a wider bus. Suddenly the CPU can handle it just fine, not max optimization, but like 80% fine. Hardware just moves slow. Drawing board to consumer for the bleeding edge is 10 years. It is the most expensive commercial venture in all of human history.

I think the future is not going to be in the giant additional math coprocessor paradigm. It is kinda sad to see Intel pursuing this route again, but maybe I still lack context for understanding UALink's intended scope. In the long term, integrating the changes necessary to run matrix math efficiently on the CPU will win on the consumer front and I imagine such flexibility would win in the data center too. Why have dedicated hardware when that same hardware could be flexibly used in any application space.