overview for zalperst

[D] Certification/coursework recommended for traditional statisticians in c/machinelearning@academy.garden

[–] zalperst@alien.top 1 points 2 years ago

Field is new and work from a year ago is already outdated and consequently so are most textbooks. It's hard to make a fundamentals course even as they continue to change. Best thing to do is read the crap out of the literature and learn pytotch or jax

[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models in c/machinelearning@academy.garden

[–] zalperst@alien.top 1 points 2 years ago

I appreciate your position, but I don't think your intuition holds here, for instance biological neural nets very likely use a qualitatively different learning algorithm than back propagation.

[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models in c/machinelearning@academy.garden

[–] zalperst@alien.top 1 points 2 years ago

I appreciate that it's possible to find a not-illogical explanation (logical would entail a real proof), but it remains surprising to me.

[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models in c/machinelearning@academy.garden

[–] zalperst@alien.top 1 points 2 years ago

Trillions of tokens, billions of parameters

[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models in c/machinelearning@academy.garden

[–] zalperst@alien.top 1 points 2 years ago (4 children)

The sample efficiency you mention is an empirical observation, that doesn't make it not surprising. Why should a single small, noisy, step of gradient descent allow you to immediately memorize the data. I think that's fundamentally surprising.

[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models in c/machinelearning@academy.garden

[–] zalperst@alien.top 1 points 2 years ago (9 children)

It's extremely surprising given many instances of data are only seen once or very few times by the model during training