Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[R] What infrastructure do you use to train big LLMs? (alien.top)

submitted 2 years ago by TimeInterview5482@alien.top to c/machinelearning@academy.garden

2 comments fedilink hide all child comments

I come from computer vision tasks with convnets that are relatively small in size and parameters, yet performing quite well (e.g. ResNet family, YOLO, etc.).

Now I am approaching some NLP and architectures based on transformers tend to be huge, so that I have problems to fit them in memory.

What infrastructure you use to train these model (GPT2, BERT or even the bigger ones)? cloud computing, HPC, etc.

top 2 comments

sorted by: hot top controversial new old

[–] KingsmanVince@alien.top 1 points 2 years ago (1 children)

I have used Google TPU for BLOOM and GPT-2 models.

[–] arena_one@alien.top 1 points 2 years ago

At your current job? What kind of role/company are you at? Most of the places I’ve seen just want to use the openai api, sadly..