The state-of-the-art on training and architecture is likely to improve over the next year alone, certainly over the next 2 or 3. It's also reasonable to expect cheaper hardware for running LLMs, since all the chip makers are working on it.
If you don't need a local LLM now but think it might save money only in the long run, it probably makes sense to wait and build one once we're better at it
Collating training data in the mean time probably makes sense. Recording as much as you can, encouraging employees to document more, etc. That data will be useful even in the absence of AI, and with improving AI technology it is likely to become more and more valuable every year. It also takes time to produce that data, and no one else can do it for you
We don't currently know exactly why gradient descent works to find powerful, generalizing minima
But, like, it does
The minima we can reliably find, in practice, don't just interpolate the training data. I mean, they do that, but they find compressions which seem to actually represent knowledge, in the sense that they can identify true relationships between concepts which reliably hold outside the training distribution.
I want to stress, "predict the next token" is what the models are trained to do, it is not what they learn to do. They learn deep representations and learn to deploy those representations in arbitrary contexts. They learn to predict tokens the same way a high-school student learns to fill in scantrons: the scantron is designed so that filling it out requires other more useful skills.
It's unclear if gradient descent will continue to work so unreasonably well as we try to push it farther and farther, but so long as the current paradigm holds I don't see a huge difference between human inference ability and Transformer inference ability. Number of neurons* and amount of training data seem to be the things holding LLMs back. Humans beat LLMs on both counts, but in some ways LLMs seem to outperform biology in terms of what they can learn with a given quantity of neurons/data. As for the "billions of years" issue, that's why we are using human-generated data, so they can catch up instead of starting from scratch.