Technology

85438 readers

5246 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

226

AI costs spike as subscriptions hit pricing wall — firms turn towards Chinese LLMs, open-source models to extend budget (www.tomshardware.com)

submitted 19 hours ago by sanitation@lemmy.today to c/technology@lemmy.world

32 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] setsubyou@lemmy.world 9 points 14 hours ago

Nowadays agents like Claude Code can run autonomously for hours just given a goal description. It doesn’t take a lot of human effort at all to set up a bunch of sessions, and these companies don’t limit how many instances you run in parallel. Agents can also spawn sub-agents that run in parallel if a task calls for parallelization. Whether all this produces good results is a different story, especially if you don’t put enough effort into the goal description. But burning tokens as such is not difficult.

Even workflows where you’re just chatting with an agent can burn a lot of tokens. When you’re chatting with an LLM, the entire history becomes part of the input each time you send something. This also applies to tool calls, so if the agent decides to read 20 files before it can work on your request that’s 20 times a file gets added to the history and 20 times that entire growing history is then sent back as input to drive the agent’s next step.

Coding is more affected by this than many other applications because even a new conversation tends to start with the agent gathering a bunch of source code files, and then the response to a task is not just a bunch of text once, but a sequence of tool calls to make edits across files, build, run tests, react to test failures, and so on, all for one actual human prompt - but in reality a back-and-forth between the LLM and the harness with a quickly growing history.