ron_krugman

joined 10 months ago
[–] ron_krugman@alien.top 1 points 9 months ago

That doesn't make much of a difference. You still have to transfer the whole model to the GPU for ever single inference step. The GPU only saves you time if you can load the model (or parts of it) once and then do lots of inference steps.