I am currently in my last year of undergrad, and about to begin my direct PhD in Multi-Modality AI next year. I have been in the community of deep learning & NLP about 2 years. And I have witnessed the development of Transformers, from the simple GPT&BERT to nowadays' billions of parameters' monsters with 'gold crown' on the top of deep learning world.
I have spent a lot of time with T5 Model, and its paper(Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, paper which I love very much!), trying to find an efficient way for usage of LLMs. I have handwritten Adapter Layer and LoRA to fine-tune T5 on Glue & SuperGlue. I have also tried out multiple fancy instruction fine-tuning LLMs like LLaMA and QWen.
And this year earlier, I have noticed the wonder of Multi-Modality, and quickly fallen in love with it, which has now become my PhD focus. I have followed recent years' Multi-Modality development, especially CLIP and its follow-up works. And LLMs play quite an important role in today's vision-language model, say, BLIP2 and LLaVA.
I believe due to the computational gap between schools and huge companies, the focus of my PhD career should be on Efficient Learning. I am also trying to enhance VLM through Retrieval-Augments. The target dataset may be Encyclopedic VQA, for even Large VLM failed to perform well on it, which could potentially be solved through Retrieval-Augmented VLM.
I would like to hear any suggestions from you, including work-life balance, the direction of my academic focus and so on , which I would treasure very much in my new explorations of life stage. (I am currently doing RAG Question Answering Chat Bot for a company for my internship, and I would like to get suggestions from that too!)
And are there subreddits like here?(I am also a member of LocalLLaMA, both subreddits benefits me a lot!)
Enjoy the ride. Take care of your body and your mind.
PhD is much like sports, you're an athlete, but of the mind. Everyday think deeply and rest well. Those are some of the very few variables you have full control over.
The actual research is, as you know, unpredictable. The field is still moving fast and entire community (nlp) can change dramatically overnight (chatgpt). So try to work on fundamentals that are somewhat invariant to these changes is a good derisking strategy.
Does it really change overnight? I mean there are "prompt engineers" and then you have the bunch writing PyTorch models and then you have.. whatever goes on the next level unless it's base math research.
You'd think these things are well known beforehand in the theoretical sphere before they are applied to practice.