meetrais

joined 1 year ago
[–] meetrais@alien.top 1 points 11 months ago (3 children)

Same experience here. I got excellent results from quantized models of Intel-Neural-7B and Mistral-7B but bad results with quantized model of Yi-34B.

 

Hey All,

I have few doubts about method to calculate tokens per second of LLM model.

  1. The way I calculate tokens per second of my fine-tuned models is, I put timer in my python code and calculate tokens per second. So if length of my output tokens is 20 and model took 5 seconds then tokens per second is 4. Am I using correct method or is there any other better method for this?

  2. If tokens per second of my model is 4 on 8 GB VRAM then will it be 8 tokens per second on 16 GB VRAM?

[–] meetrais@alien.top 1 points 11 months ago (1 children)

Best part for me was Security. Security issues Andrej shown are really eye opening. But may be that would create new opportunities in LLM security just like Cyber security.

[–] meetrais@alien.top 1 points 11 months ago

On HuggingFace you can find many fine-tuned/quantized models. Look for models from TheBloke on HuggingFace.

[–] meetrais@alien.top 1 points 11 months ago

I am interested to see if new leadership of OpenAI would be more LLM dev friendly and stop conspiring with politicians to hinder open source LLM community innovations.

[–] meetrais@alien.top 1 points 11 months ago (3 children)

I second this. Mistral-7B gave me good results. After fine-tuning it's result is even better.

 

Which Operating System you use for local LLM work?

View Poll