I'm not sure if 3D cache would help in this case, since there isn't a particular small part of the model that could be reused over and over: you have to read _all_ the weights when inferring the next word, right?
But I'm definitely looking forward to the 8000 series, since AM5 boards should get even cheaper by the time it comes out, and support for faster DDR5 should get better as well. And I really need to move on from my 10 years old Xeon haha..
I'm not sure if 3D cache would help in this case, since there isn't a particular small part of the model that could be reused over and over: you have to read _all_ the weights when inferring the next word, right?
But I'm definitely looking forward to the 8000 series, since AM5 boards should get even cheaper by the time it comes out, and support for faster DDR5 should get better as well. And I really need to move on from my 10 years old Xeon haha..