Technology

86122 readers

4387 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

507

DeepSeek Permanently Reduces The Price Of Its Flagship V4 Model By 75 Percent (tech.yahoo.com)

submitted 1 month ago by jaykrown@lemmy.world to c/technology@lemmy.world

138 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] yesman@lemmy.world 11 points 1 month ago (3 children)

I'm unfamiliar with AI chatbots that you pay for. What is a token?

[–] mic_check_one_two@lemmy.dbzer0.com 22 points 1 month ago* (last edited 1 month ago)

A token is basically just a word. Know how your phone’s auto suggest tries to anticipate the words you want to use as you type? In this case, your phone is using an extremely small token amount (typically only the previous two or three words you have typed) to try and predict your next word, which would also be a token. Your phone only uses a few tokens at a time, because as token count rises, processing requirements also quickly balloon.

And AI chat is basically the same concept, but with a massively inflated token limit. Instead of looking at your previous two or three words, it looks at entire conversations. And it also uses tokens to generate responses, the same way your phone is using one token at a time to predict your next word.

So when you pay for tokens, you’re essentially paying for a word count. As you continue a conversation, the token requirement for each subsequent request will increase, because it is attempting to look at the entire context of the conversation you have had.

Models have built-in token limits, to put a cap on how much memory is required to run the model. As conversations stretch on and you reach the model’s token limits, it will begin losing context for things that happened earlier. It will try to summarize earlier parts of the conversation to shorten them but keep relevant pieces in memory, or it will just outright drop old parts of the conversation and “forget” that context, the same way my phone has already forgotten the start of this sentence.

It’s a little more complicated that “each word is a token”, because the chatbot will combine your prompts with its own internal systems. Especially as conversations stretch on, and it begins to summarize old parts to keep them in memory. But that’s the most straightforward way to explain it.

[–] boatswain@infosec.pub 15 points 1 month ago (2 children)

My understanding is that tokens are basically words, and that when you ask a question it charges for all the tokens it consumes, produces, or processes. There's a lot of internal processing for each request, where the input text is summarized in different ways and combined with previous parts of the conversation, so it's not as straightforward as "word count of what you say plus what it says".

[–] iamthetot@piefed.ca 17 points 1 month ago* (last edited 1 month ago)

Worth noting that a token is not necessarily a word, though can be. One word could also take multiple tokens. It can also vary from LLM to LLM and their tokenization methods.

[–] teft@piefed.social -4 points 1 month ago (1 children)

There’s a lot of internal processing for each request, where the input text is summarized in different ways and combined with previous parts of the conversation, so it’s not as straightforward as “word count of what you say plus what it says”.

In other words obfuscation so they can charge whatever they want using some obscure formula that only they know.

[–] eager_eagle@lemmy.world 8 points 1 month ago* (last edited 1 month ago)

Not really, there are ways to count tokens before running an inference. Some providers make tokenizers public, so they even work offline. APIs also usually return rolling costs per response and have budget limits - though some could have more fine-grained limits.

Users who are surprised by the bill are usually not paying attention to each call, or using autonomous subagents, or a setup where they have little or no control to what is sent to the provider.

So the problem isn't really the API provider, as much as it's the tooling around it, which makes it too easy to overspend.

[–] Peruvian_Skies@sh.itjust.works 4 points 1 month ago

In very simple terms, a token is more or less a word. You pay per input and output tokens (your prompts and the answers) as they correlate the most closely to the energy expended by the LLM to process your request.