If I would start again I would focus more on NLP, many of the things I have studied are obsolete. my only suggestion is: to choose the field you like more and go in-depth, especially the application field. For example, I work on AI in biological applications, beyond the knowledge of the algorithms, domain expertise is key
NoIdeaAbaout
joined 1 year ago
That's very small for a trasformer, as a rule of thumb, this is meaning 25M parameters. Not sure there are similar ones
you can try this:
I think you can try a similar way for another task, for me, the approach can be generalized to different tasks
Have you seen this article by Google?
https://arxiv.org/abs/2305.02301
https://blog.research.google/2023/09/distilling-step-by-step-outperforming.html
they claim that they were able to distill for reasoning task PaLM with T5 (2000 times difference in size) and the distilled T5 was outperforming PaLM
code is here:
There are much less alternatives, you could use keras in R but is actually a hassle. There are few alternatives for topic modelling in R: