Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 11 months ago

MODERATORS

communick@academy.garden

[Discussion]About to begin my PhD in Multi-Modality AI, any suggestions? (alien.top)

submitted 10 months ago by Go2Heart@alien.top to c/machinelearning@academy.garden

9 comments fedilink hide all child comments

I am currently in my last year of undergrad, and about to begin my direct PhD in Multi-Modality AI next year. I have been in the community of deep learning & NLP about 2 years. And I have witnessed the development of Transformers, from the simple GPT&BERT to nowadays' billions of parameters' monsters with 'gold crown' on the top of deep learning world.

I have spent a lot of time with T5 Model, and its paper(Exploring the Limits of Transfer Learning with a Uniﬁed Text-to-Text Transformer, paper which I love very much!), trying to find an efficient way for usage of LLMs. I have handwritten Adapter Layer and LoRA to fine-tune T5 on Glue & SuperGlue. I have also tried out multiple fancy instruction fine-tuning LLMs like LLaMA and QWen.

And this year earlier, I have noticed the wonder of Multi-Modality, and quickly fallen in love with it, which has now become my PhD focus. I have followed recent years' Multi-Modality development, especially CLIP and its follow-up works. And LLMs play quite an important role in today's vision-language model, say, BLIP2 and LLaVA.

I believe due to the computational gap between schools and huge companies, the focus of my PhD career should be on Efficient Learning. I am also trying to enhance VLM through Retrieval-Augments. The target dataset may be Encyclopedic VQA, for even Large VLM failed to perform well on it, which could potentially be solved through Retrieval-Augmented VLM.

I would like to hear any suggestions from you, including work-life balance, the direction of my academic focus and so on , which I would treasure very much in my new explorations of life stage. (I am currently doing RAG Question Answering Chat Bot for a company for my internship, and I would like to get suggestions from that too!)

And are there subreddits like here?(I am also a member of LocalLLaMA, both subreddits benefits me a lot!)

you are viewing a single comment's thread
view the rest of the comments

[–] ChaosLamp_Genie@alien.top 1 points 10 months ago

Excited for you. My advice is to buddy up with other PhDs as it is often the best part of the journey. Academics in CS can be shy so put some effort in making the first move.

We really need more efficient networks! Here is an insight for you. Current large models use the same operators for billions of neurons, but like in classical circuits, where implementing a truth table with NANDs alone requires up to an extra exponential number of them compared to using the best combination of logic gates, neural circuitry can suffer from the same inefficiencies. For example you can approximate a smooth curve out of thousands of ReLU or you can use 1 trigonometric function. Neural architecture search (NAS) looks at this but is definitely not there yet and is even more expensive than normal learning!