I think llama 1 had more interesting training data, but it can’t hold a plot too well
Nkingsy
joined 10 months ago
Trained on a larger # of tokens. All the llama models are under trained it appears, especially the 70b
I think llama 1 had more interesting training data, but it can’t hold a plot too well
Trained on a larger # of tokens. All the llama models are under trained it appears, especially the 70b
Or the more undertrained it is, the more fat can be trimmed