jorgemf

joined 11 months ago
[–] jorgemf@alien.top 1 points 11 months ago

Think about this: 9 women take 9 months to have 9 babies, so 1 woman should take 1 month. That is basically what you are saying.

What is basically happening is that one request takes >20seconds but your GPU utilization is far from 100%. Basically you are not using the full parallelism potential of the GPU because that is how the model works. There is very little you can do there. But when you have 6 requests at the same time, the GPU can make those requests parallel with 100% usage of its resources.