jugalator

joined 11 months ago
[–] jugalator@alien.top 1 points 10 months ago (4 children)

Research papers have also observed diminishing returns issues as models grow.

Hell maybe even GPT-4 was hit by this and that's why GPT-4 is not a single giant language model but running a mixture of experts design of eight 220B models trained for subtasks.

But I think even this architecture will run into issues and that it's more like a crutch. I mean, you'll eventually grow each of these subtask models too large and might need to split them as well, but this might mean you run into too small/niche fields per respective model and that sounds like the end of that road to me.