And recently tensor cores have started appearing with 8 bit float/int as well, which gives them a huge advantage in inference throughput. The memory bandwidth limitation can be mitigated by increasing the batch size.
wen_mars
joined 1 year ago
No I lost all my XMR in a tragic boating accident, as is the custom
If you multiply two 16-bit numbers the result can overflow the range that can be represented by 16 bits.