Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[Discussion]About to begin my PhD in Multi-Modality AI, any suggestions? (alien.top)

submitted 2 years ago by Go2Heart@alien.top to c/machinelearning@academy.garden

9 comments fedilink hide all child comments

I am currently in my last year of undergrad, and about to begin my direct PhD in Multi-Modality AI next year. I have been in the community of deep learning & NLP about 2 years. And I have witnessed the development of Transformers, from the simple GPT&BERT to nowadays' billions of parameters' monsters with 'gold crown' on the top of deep learning world.

I have spent a lot of time with T5 Model, and its paper(Exploring the Limits of Transfer Learning with a Uniﬁed Text-to-Text Transformer, paper which I love very much!), trying to find an efficient way for usage of LLMs. I have handwritten Adapter Layer and LoRA to fine-tune T5 on Glue & SuperGlue. I have also tried out multiple fancy instruction fine-tuning LLMs like LLaMA and QWen.

And this year earlier, I have noticed the wonder of Multi-Modality, and quickly fallen in love with it, which has now become my PhD focus. I have followed recent years' Multi-Modality development, especially CLIP and its follow-up works. And LLMs play quite an important role in today's vision-language model, say, BLIP2 and LLaVA.

I believe due to the computational gap between schools and huge companies, the focus of my PhD career should be on Efficient Learning. I am also trying to enhance VLM through Retrieval-Augments. The target dataset may be Encyclopedic VQA, for even Large VLM failed to perform well on it, which could potentially be solved through Retrieval-Augmented VLM.

I would like to hear any suggestions from you, including work-life balance, the direction of my academic focus and so on , which I would treasure very much in my new explorations of life stage. (I am currently doing RAG Question Answering Chat Bot for a company for my internship, and I would like to get suggestions from that too!)

And are there subreddits like here?(I am also a member of LocalLLaMA, both subreddits benefits me a lot!)

top 9 comments

sorted by: hot top controversial new old

[–] ChaosLamp_Genie@alien.top 1 points 2 years ago

Excited for you. My advice is to buddy up with other PhDs as it is often the best part of the journey. Academics in CS can be shy so put some effort in making the first move.

We really need more efficient networks! Here is an insight for you. Current large models use the same operators for billions of neurons, but like in classical circuits, where implementing a truth table with NANDs alone requires up to an extra exponential number of them compared to using the best combination of logic gates, neural circuitry can suffer from the same inefficiencies. For example you can approximate a smooth curve out of thousands of ReLU or you can use 1 trigonometric function. Neural architecture search (NAS) looks at this but is definitely not there yet and is even more expensive than normal learning!

[–] evanthebouncy@alien.top 1 points 2 years ago (3 children)

Enjoy the ride. Take care of your body and your mind.

PhD is much like sports, you're an athlete, but of the mind. Everyday think deeply and rest well. Those are some of the very few variables you have full control over.

The actual research is, as you know, unpredictable. The field is still moving fast and entire community (nlp) can change dramatically overnight (chatgpt). So try to work on fundamentals that are somewhat invariant to these changes is a good derisking strategy.

[–] thedabking123@alien.top 1 points 2 years ago

As a guy in his mid 30s doing masters level courses at Stanford in prep for a possible part time masters this advice to stick with the basics for risk mitigation is much needed.

[–] Go2Heart@alien.top 1 points 2 years ago

Yeah. I am noted for your advice. And I WILL keep both my body and mind strong!

[–] Important-Product210@alien.top 1 points 2 years ago

Does it really change overnight? I mean there are "prompt engineers" and then you have the bunch writing PyTorch models and then you have.. whatever goes on the next level unless it's base math research.

You'd think these things are well known beforehand in the theoretical sphere before they are applied to practice.

[–] parabellum630@alien.top 1 points 2 years ago

I am doing a masters and thesis on multi modal AI. Everyrime I get down to write a paper some else releases it. Before it was happening every 3 to 4 months, now it's every month. So a bit frustrating with so much competition out there.

[–] TheMan_TheMyth@alien.top 1 points 2 years ago

Don’t be afraid to let the topic of your PhD pivot to something tangential that you might be more likely to publish in. The name of the game in a PhD is to publish. You should publish as fast as possible. You can worry about being exactly on topic when you have the degree and space to do what you want.

[–] damhack@alien.top 1 points 2 years ago

How about exploring a framework for synchronizing time-variant multi-modal inputs so that multiple “senses” can be associated with an “event” that can be treated as an inference object. Extend this to simulating synesthesia too and you have a powerful approach for feeding cognition in AGI and robotics.

[–] cenji@alien.top 1 points 2 years ago

Learn about Cognitive Science and Neuroscience. Computer Science isn’t a science these days, but more like a branch of engineering. While classic engineering disciplines have solid science underpinnings (eg EE has physics of electromagnetism etc), there is no mature science of intelligence yet. However, there is a lot known from biology about intelligence. Learn about that so you have something to draw on rather than just ad-hoc ideas. (I have a PhD in CS robotics&AI - somewhat dated)