zalperst

joined 11 months ago
[–] zalperst@alien.top 1 points 11 months ago

Field is new and work from a year ago is already outdated and consequently so are most textbooks. It's hard to make a fundamentals course even as they continue to change. Best thing to do is read the crap out of the literature and learn pytotch or jax

[–] zalperst@alien.top 1 points 11 months ago

I appreciate your position, but I don't think your intuition holds here, for instance biological neural nets very likely use a qualitatively different learning algorithm than back propagation.

[–] zalperst@alien.top 1 points 11 months ago

I appreciate that it's possible to find a not-illogical explanation (logical would entail a real proof), but it remains surprising to me.

[–] zalperst@alien.top 1 points 11 months ago

Trillions of tokens, billions of parameters

[–] zalperst@alien.top 1 points 11 months ago (4 children)

The sample efficiency you mention is an empirical observation, that doesn't make it not surprising. Why should a single small, noisy, step of gradient descent allow you to immediately memorize the data. I think that's fundamentally surprising.

[–] zalperst@alien.top 1 points 11 months ago (9 children)

It's extremely surprising given many instances of data are only seen once or very few times by the model during training