this post was submitted on 24 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Hey All,

I have few doubts about method to calculate tokens per second of LLM model.

  1. The way I calculate tokens per second of my fine-tuned models is, I put timer in my python code and calculate tokens per second. So if length of my output tokens is 20 and model took 5 seconds then tokens per second is 4. Am I using correct method or is there any other better method for this?

  2. If tokens per second of my model is 4 on 8 GB VRAM then will it be 8 tokens per second on 16 GB VRAM?

you are viewing a single comment's thread
view the rest of the comments
[–] phree_radical@alien.top 1 points 10 months ago

I just wrap it in tqdm