Technology

86068 readers

4083 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

507

DeepSeek Permanently Reduces The Price Of Its Flagship V4 Model By 75 Percent (tech.yahoo.com)

submitted 1 month ago by jaykrown@lemmy.world to c/technology@lemmy.world

138 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] mic_check_one_two@lemmy.dbzer0.com 22 points 1 month ago* (last edited 1 month ago)

A token is basically just a word. Know how your phone’s auto suggest tries to anticipate the words you want to use as you type? In this case, your phone is using an extremely small token amount (typically only the previous two or three words you have typed) to try and predict your next word, which would also be a token. Your phone only uses a few tokens at a time, because as token count rises, processing requirements also quickly balloon.

And AI chat is basically the same concept, but with a massively inflated token limit. Instead of looking at your previous two or three words, it looks at entire conversations. And it also uses tokens to generate responses, the same way your phone is using one token at a time to predict your next word.

So when you pay for tokens, you’re essentially paying for a word count. As you continue a conversation, the token requirement for each subsequent request will increase, because it is attempting to look at the entire context of the conversation you have had.

Models have built-in token limits, to put a cap on how much memory is required to run the model. As conversations stretch on and you reach the model’s token limits, it will begin losing context for things that happened earlier. It will try to summarize earlier parts of the conversation to shorten them but keep relevant pieces in memory, or it will just outright drop old parts of the conversation and “forget” that context, the same way my phone has already forgotten the start of this sentence.

It’s a little more complicated that “each word is a token”, because the chatbot will combine your prompts with its own internal systems. Especially as conversations stretch on, and it begins to summarize old parts to keep them in memory. But that’s the most straightforward way to explain it.