If you’re talking about LLMs, then you’re judging the tool by the wrong metric. They’re not designed to solve problems or pass captchas - they’re designed to generate coherent, natural-sounding text. That’s the task they’re trained for, and that’s where their narrow intelligence lies.
The fact that people expect factual accuracy or problem-solving ability is a mismatch between expectations and design - not a failure of the system itself. You're blaming the hammer for not turning screws.
Any individual task I mean. Not every task.