On above setups 512 GB ram size we can fit a 512B parameters model, that will run 5*7/512=0.068 words per second with the current architecture, if this new architecture actually works and give 78x speed up it will be 5.3 words per second, the average persons reading speed is around 4 words per second. And average persons speaking speed is around 2 words per second.

Fingers crossed this can put a small dent on Nvidia's stock price.

[–] fallingdowndizzyvr@alien.top 1 points 2 years ago

Fingers crossed this can put a small dent on Nvidia's stock price.

If it works that way, it will only be short term. Since the only reason it doesn't run on a GPU is because of conditional matrix OPs. So the GPU makers will just add them. Then they'll will be back on top with the same margins again.

Also, they say the speedup decreases with more layers. So the bigger the model, the less the benefit. A 512B model is much bigger than a 7B model thus the speedup will be much less. Possibly none.

[–] MoffKalast@alien.top 1 points 2 years ago

I doubt it, most of their leverage is in being the only suppliers of hardware required for pretraining foundational models. This doesn't really change that.