this post was submitted on 15 Jun 2026
226 points (97.5% liked)
Technology
85438 readers
5246 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Nowadays agents like Claude Code can run autonomously for hours just given a goal description. It doesn’t take a lot of human effort at all to set up a bunch of sessions, and these companies don’t limit how many instances you run in parallel. Agents can also spawn sub-agents that run in parallel if a task calls for parallelization. Whether all this produces good results is a different story, especially if you don’t put enough effort into the goal description. But burning tokens as such is not difficult.
Even workflows where you’re just chatting with an agent can burn a lot of tokens. When you’re chatting with an LLM, the entire history becomes part of the input each time you send something. This also applies to tool calls, so if the agent decides to read 20 files before it can work on your request that’s 20 times a file gets added to the history and 20 times that entire growing history is then sent back as input to drive the agent’s next step.
Coding is more affected by this than many other applications because even a new conversation tends to start with the agent gathering a bunch of source code files, and then the response to a task is not just a bunch of text once, but a sequence of tool calls to make edits across files, build, run tests, react to test failures, and so on, all for one actual human prompt - but in reality a back-and-forth between the LLM and the harness with a quickly growing history.