Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[D] How large an LLM can I train from scratch on a single A100 GPU with 80Gb memory? (alien.top)

submitted 2 years ago by eeeehhh@alien.top to c/machinelearning@academy.garden

4 comments fedilink hide all child comments

I have access to a single 80Gb A100 GPU and would like to train an LLM with GPT-like architecture from scratch. Does anyone know how to calculate the maximum model size.

you are viewing a single comment's thread
view the rest of the comments

[–] karlwikman@alien.top 1 points 2 years ago

This question might come off as stupid, but it's really something I'm curious about:

I 100% see why someone would like to take a state-of-the-art current open model and fine-tune it on their own data. I don't see why someone would want to train their own model from scratch. Can you explain it?