this post was submitted on 19 Nov 2023
1 points (100.0% liked)
Machine Learning
1 readers
1 users here now
Community Rules:
- Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
- Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
- Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
- Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It would more than likely lose the ability to "make funny." The problem is called "catastrophic forgetting" and it comes up when pretrained models are fine tuned on downstream tasks. There's some literature that shows the original pretraining induces some bias (English BERT models fine-tuned on multilingual sets retain English syntax patterns) but more often than not the model purges ability to perform it's original task.
There's some good recent papers on how to tackle this. My favourite paper on the topic is probably here: Robust fine-tuning of zero-shot models - arXiv https://arxiv.org/pdf/2109.01903
But tl;dr: a weighted average of fine-tuned weights and original weights through a manually chosen weight tend to greatly mitigate this problem. I was surprised the paper didn't get more attention when it came out, but oh well.