Yeah, this discrepancy really irks me in programming, too. It's really good at known problems, like student homework or whatever task a middle manager will throw at it to see how well it works.
But because of the nature of software – if there is a solution, you can easily share it with everyone in the world – it's kind of our job to work on anything but known problems.
Yeah, there's gonna be some known parts, where it may be able to assist, similar to a library or StackOverflow. But if it can put together your whole solution without tons of human input, chances are that solution is already out there and you should be using it instead.
Personally, I find that (complex) software implemented in Python tends to be so unreliable that I typically don't want to use it after all, but I only find that out after wasting a bunch of time learning the software.
It's just frustrating, especially if I come back to the software every so often, naively thinking that it's been a few versions, so maybe they've fixed it. It's always just different bugs, which still end up being too frustrating to use the software.
To give an example, I like to compose music using Lilypond, which is more-or-less a programming language to create sheet music. And there is a program that's supposed to give you a well-integrated workflow for that (i.e. an IDE), called Frescobaldi.
The first time I tried it, playback of the composed music wouldn't work.
The second time, I couldn't click on notes to jump to the respective code snippet.
And I tried it again a few weeks ago and it just crashed immediately with an obscure error message.
Instead, I've slapped together a script, which just opens the sheet music in my PDF viewer, the code in my normal editor and then uses a CLI tools to generate and playback the sheet music. And while it's definitely not perfect, it has been working more reliably for me than Frescobaldi ever has.