keepthepace

joined 1 year ago
[–] keepthepace@alien.top 1 points 11 months ago (5 children)

Multiple passes at lower learning rates isn't supposed to produce different results.

Oh yes it is. The whole point of gradient descent is to slowly explore the dimensions of the gradient. With smaller steps you have a totally different trajectory than with bigger steps. And every pass makes you move.

If you choose a too small learning rate you often will indeed just move slower on the same path but a too big learning rate makes you skip entire paths.

OP seems to have been in that case with their first attempt.

[–] keepthepace@alien.top 1 points 11 months ago (2 children)

That part made me smile. It is a pretty good news that MS is not in control of OpenAI.

And if it turns out that this drama really happened out of safety concern rather than personal profits or ego, I would like people to take a step back and realize how great of a news as to where we are as a society.