this post was submitted on 27 Nov 2023

1 points (100.0% liked)

Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[R]eading List for Andrej Karpathy’s “Busy person’s intro to Large Language Models” Video (alien.top)

submitted 2 years ago by FallMindless3563@alien.top to c/machinelearning@academy.garden

19 comments fedilink hide all child comments

I loved Andrej’s talk about in his “Busy person’s intro to Large Language Models” video, so I decided to create a reading list to dive in deeper to a lot of the topics. I feel like he did a great job of describing the state of the art for anyone from an ML Researcher to any engineer who is interested in learning more.

The full talk can be found here: https://youtu.be/zjkBMFhNj_g?si=fPvPyOVmV-FCTFEx

Here’s the reading list: https://blog.oxen.ai/reading-list-for-andrej-karpathys-intro-to-large-language-models-video/

Let me know if you have any other papers you would add!

top 19 comments

sorted by: hot top controversial new old

[–] lakolda@alien.top 1 points 2 years ago (1 children)

Some of the content also seems to allude to what Q* might be…

[–] currentscurrents@alien.top 1 points 2 years ago (1 children)

It really doesn't, because no one has any clue what Q* is or if it's even real.

[–] lakolda@alien.top 1 points 2 years ago

Ever hear the term might?

[–] um-xpto@alien.top 1 points 2 years ago (2 children)

Nice! Thank you for your work.

Regarding the video.

Q1) minute 14:14 Finetuning into an Assistant, when you have multiple tasks / datasets with diverse outputs how is training performed ? Are all datasets combined in a single training ? Or Is finetuning done over a previous finetuning ? Or the question is parsed and sent to a specific model ?

Q2) minute 27:43 Tool Use (Browser, Calculator, etc. ) Anyone has links for similar implementations for llama and how is done or what kind of tech/frameworks are used ?

[–] FallMindless3563@alien.top 1 points 2 years ago

You certainly can combine all the tasks and datasets into a single instruction fine tuning dataset. Then you would have a separate dataset for the reinforcement learning half where the model is learning human preferences.

[–] Disastrous_Elk_6375@alien.top 1 points 2 years ago (1 children)

Q2) minute 27:43 Tool Use (Browser, Calculator, etc. ) Anyone has links for similar implementations for llama and how is done or what kind of tech/frameworks are used ?

The naive way is to use langchain, but that's hit and miss for several reasons, and whatever you build will be held together by duct tape and prayers. Alternative frameworks include Haystack and Griptape.

I've found that for local models the best tool-usage you can get is by using an advanced control library. This gives you a lot of flexibility in organising the prompts and "helping" the local models a lot. Guidance and LMQL are two such libraries.

[–] um-xpto@alien.top 1 points 2 years ago

Thanks. Guidance seems a good fit I'll start looking for more info.

[–] akardashian@alien.top 1 points 2 years ago

thanks for compiling!

[–] coumineol@alien.top 1 points 2 years ago (1 children)

Thanks but here's the problem with this list: most of the papers mentioned are on a very high technical level, and people who would be able to understand them are probably people who have already read them. Note that Andrej was careful to keep the material at a certain level because he addresses those who want to go one step further than talking to ChatGPT, without necessarily understanding all the underlying theory.

[–] teryret@alien.top 1 points 2 years ago (1 children)

Right, that's why OP prefaced with "to dive deeper into a lot of the topics". If folks aren't at a point where diving deeper makes sense, it's not a list for them. There are plenty of resources for any given level of understanding, obviously no list is going to be appropriate for every member of a diverse community.

[–] coumineol@alien.top 1 points 2 years ago (2 children)

Not to start an argument here but I can't imagine anybody with any level of understanding who should start diving deeper by reading the "Attention is All You Need" paper. Yes, this is a diverse community, but when you try to address everybody's needs, you usually end up with addressing nobody's needs.

[–] whymauri@alien.top 1 points 2 years ago (1 children)

Just me, but I think of busy coworkers with great background in math/stats and 'classic' ML who would ramp up quickly from a list like this. When I onboarded chemists (PhDs) to my ML team at a drug startup, I would send them a similarly dense reading list. With their strong background in physics, it would take them two weeks flat to understand the necessary theory and jargon to be productive (in our niche field).

[–] coumineol@alien.top 1 points 2 years ago

Didn't mean to say those papers are completely useless, but even for those with a strong Math/ML background I would advise starting with recent survey papers. Reading "Attention is All You Need" is kind of like reading the General Relativity papers of Einstein - cool as a historical curiosity, but not ideal for optimizing expertise acquisition.

[–] eek04@alien.top 1 points 2 years ago (1 children)

Since "Attention is All You Need" is fairly high on my reading list for understanding the details of transformer architecture, what do you recommend instead?

[–] coumineol@alien.top 1 points 2 years ago (1 children)

https://arxiv.org/abs/2106.04554

If you're trying to learn more about language models don't bother with anything written before 2020. That's basically the Stone Age.

[–] eek04@alien.top 1 points 2 years ago

Thank you!

[–] Maykey@alien.top 1 points 2 years ago

I haven't watch the talk, but I think the reading list should have some love for SSM. (S4, S5, H3): on one hand their variants are very prominent on long range arena on other they are relatively "unknown".

They are not unknown to researchers seeing how many variants there are, but there are hundreds more videos and blogs explaining transformers. If you find a course about LLM, it will likely include Transformers but not SSM, so I think their success in LRA and absence in learning materials qualifies them for "dive in deeper" list.

[–] derpgod123@alien.top 1 points 2 years ago (1 children)

Only papers to read no books?

[–] FallMindless3563@alien.top 1 points 2 years ago

The only book he explicitly mentions is "Thinking Fast and Slow" by Daniel Kahneman, but I think there are a ton of books that would be great resources along side the papers. I just happened to pull a lot of the papers from the footnotes and concepts he mentioned.