LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Intro to Large Language Models | Andrew Karpathy | Summary (alien.top)

submitted 2 years ago by phoneixAdi@alien.top to c/localllama@poweruser.forum

6 comments fedilink hide all child comments

Karpathy (one of my hero's along with Ilya) dropped a video after a long time. It's called : "Intro to Large Language Models". If you are in the field or generally curious, please go watch it! He is the best AI teacher that I know of. Simplifies concepts for simple folks like me.

If anyone wants summarized notes of that video its below here :

1. Large language models are powerful tools for problem solving, with potential for self-improvement.

Large language models (LLMs) are powerful tools that can generate text based on input, consisting of two files: parameters and run files. They are trained using a complex process, resulting in a 100x compression ratio. The neural network predicts the next word in a sequence by feeding in a sequence of words and using parameters dispersed throughout the network. The performance of LLMs in predicting the next word is influenced by two variables: the number of parameters in the network and the amount of text used for training. The trend of improving accuracy with bigger models and more training data suggests that algorithmic progress is not necessary, as we can achieve more powerful models by simply increasing the size of the model and training it for longer. LLMs are not just chatbots or word generators, but rather the kernel process of an emerging operating system, capable of coordinating resources for problem solving, reading and generating text, browsing the internet, generating images and videos, hearing and speaking, generating music, and thinking for a long time. They can also self-improve and be customized for specific tasks, similar to open-source operating systems.

2. Language models are trained in two stages: pre-training for knowledge and fine-tuning for alignment.

The process of training a language model involves two stages: pre-training and fine-tuning. Pre-training involves compressing text into a neural network using expensive computers, which is a computationally expensive process that only happens once or twice a year. This stage focuses on knowledge. In the fine-tuning stage, the model is trained on high-quality conversations, which allows it to change its formatting and become a helpful assistant. This stage is cheaper and can be repeated iteratively, often every week or day. Companies often iterate faster on the fine-tuning stage, releasing both base models and assistant models that can be fine-tuned for specific tasks.

3. Large language models aim to transition to system two thinking for accuracy.

The development of large language models, like GPT and Claude, is a rapidly evolving field, with advancements in language models and human-machine collaboration. These models are currently in the system one thinking phase, generating words based on neural networks. However, the goal is to transition to system two thinking, where they can take time to think through a problem and provide more accurate answers. This would involve creating a tree of thoughts and reflecting on a question before providing a response. The question now is how to achieve self-improvement in these models, which lack a clear reward function, making it challenging to evaluate their performance. However, in narrow domains, a reward function could be achievable, enabling self-improvement. Customization is another axis of improvement for language models.

4. Large language models can use tools, engage in speech-to-speech, and be customized for diverse tasks.

Large language models like ChatGPT are capable of using tools to perform tasks, such as searching for information and generating images. They can also engage in speech-to-speech communication, creating a conversational interface to AI. The economy has diverse tasks, and these models can be customized to become experts at specific tasks. This customization can be done through the GPT's app store, where specific instructions and files for reference can be uploaded. The goal is to have multiple language models for different tasks, rather than relying on a single model for everything.

5. Large language models' security challenges require ongoing defense strategies.

The new computing paradigm, driven by large language models, presents new security challenges. One such challenge is prompt injection attacks, where the models are given new instructions that can cause undesirable effects. Another is the potential for misuse of knowledge, such as creating napalm. These attacks are similar to traditional security threats, with a cat and mouse game of attack and defense. It's crucial to be aware of these threats and develop defenses against them, as the field of LM security is rapidly evolving.

top 6 comments

sorted by: hot top controversial new old

[–] CommanderOfReddit@alien.top 1 points 2 years ago

1. Large language models are for writing smut and roleplaying with your anime wife.

There, I fixed it for reddit.

[–] ciaguyforeal@alien.top 1 points 2 years ago

its funny that he thinks this is for a general audience, when in fact this is still a very highly technical perspective. More accessible for those already interested in the tech, but basically everything here is of interest to someone who wants to catch up on research, not use.

[–] meetrais@alien.top 1 points 2 years ago (1 children)

Best part for me was Security. Security issues Andrej shown are really eye opening. But may be that would create new opportunities in LLM security just like Cyber security.

[–] ReturningTarzan@alien.top 1 points 2 years ago

Most of those security issues are just silly. Like, oh no, what if the model answers a question with some "dangerous" knowledge that's already in the top three search results if you Google the exact same question? Whatever will we do?

The other ones arise from inserting an LLM across where there would be a security boundary, like by giving it access to personal documents and at the same time an accessible interface to people who shouldn't have that access. So a new, poorly understood technology provides novel ways for people to make bad assumptions in their rush to monetize it. News at 11.

Of course it's still a great segment and easily the most interesting part of the video.

[–] dhaitz@alien.top 1 points 2 years ago

Another is the potential for misuse of knowledge, such as creating napalm"

IMHO these examples of "I tricked ChatGPT into telling me how to build a bomb!!" are fun, but you can find this information online anyway. This is mainly a PR problem if screenshots of company XY's new chatbot spewing problematic content are circulating on social media.

The point is rather that any information the LLM has ever seen (during training or in its prompt) can be leaked to the user, no matter how thorough your finetuning or prompt engineering is.

[–] phoneixAdi@alien.top 1 points 2 years ago

Okay this was well received.

Did not expect that 😅.
FYI. I also posted the full summary with link to relevant sections of the transcripts here now : https://www.wisdominanutshell.academy/andrej-karpathy/1hr-talk-intro-to-large-language-models/

If you want to bookmarked it or something.