eeeehhh

joined 10 months ago
 

I have access to a single 80Gb A100 GPU and would like to train an LLM with GPT-like architecture from scratch. Does anyone know how to calculate the maximum model size.