LocalLLaMA

1 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago

MODERATORS

submitted 10 months ago by meetrais@alien.top to c/localllama@poweruser.forum

3 comments fedilink hide all child comments

Hey All,

I have few doubts about method to calculate tokens per second of LLM model.

The way I calculate tokens per second of my fine-tuned models is, I put timer in my python code and calculate tokens per second. So if length of my output tokens is 20 and model took 5 seconds then tokens per second is 4. Am I using correct method or is there any other better method for this?
If tokens per second of my model is 4 on 8 GB VRAM then will it be 8 tokens per second on 16 GB VRAM?

you are viewing a single comment's thread
view the rest of the comments

[–] phree_radical@alien.top 1 points 10 months ago

I just wrap it in tqdm