Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[Discussion]About to begin my PhD in Multi-Modality AI, any suggestions? (alien.top)

submitted 2 years ago by Go2Heart@alien.top to c/machinelearning@academy.garden

9 comments fedilink hide all child comments

I am currently in my last year of undergrad, and about to begin my direct PhD in Multi-Modality AI next year. I have been in the community of deep learning & NLP about 2 years. And I have witnessed the development of Transformers, from the simple GPT&BERT to nowadays' billions of parameters' monsters with 'gold crown' on the top of deep learning world.

I have spent a lot of time with T5 Model, and its paper(Exploring the Limits of Transfer Learning with a Uniﬁed Text-to-Text Transformer, paper which I love very much!), trying to find an efficient way for usage of LLMs. I have handwritten Adapter Layer and LoRA to fine-tune T5 on Glue & SuperGlue. I have also tried out multiple fancy instruction fine-tuning LLMs like LLaMA and QWen.

And this year earlier, I have noticed the wonder of Multi-Modality, and quickly fallen in love with it, which has now become my PhD focus. I have followed recent years' Multi-Modality development, especially CLIP and its follow-up works. And LLMs play quite an important role in today's vision-language model, say, BLIP2 and LLaVA.

I believe due to the computational gap between schools and huge companies, the focus of my PhD career should be on Efficient Learning. I am also trying to enhance VLM through Retrieval-Augments. The target dataset may be Encyclopedic VQA, for even Large VLM failed to perform well on it, which could potentially be solved through Retrieval-Augmented VLM.

I would like to hear any suggestions from you, including work-life balance, the direction of my academic focus and so on , which I would treasure very much in my new explorations of life stage. (I am currently doing RAG Question Answering Chat Bot for a company for my internship, and I would like to get suggestions from that too!)

And are there subreddits like here?(I am also a member of LocalLLaMA, both subreddits benefits me a lot!)

you are viewing a single comment's thread
view the rest of the comments

[–] damhack@alien.top 1 points 2 years ago

How about exploring a framework for synchronizing time-variant multi-modal inputs so that multiple “senses” can be associated with an “event” that can be treated as an inference object. Extend this to simulating synesthesia too and you have a powerful approach for feeding cognition in AGI and robotics.