DataLearnerAI

joined 10 months ago
[–] DataLearnerAI@alien.top 1 points 9 months ago

Ali opensouced a 72B model called Qwen-72B: Qwen/Qwen-72B · Hugging Face

It supports Chinese and English. The performance on MMLU is remarkable.

 

I know that vLLM and TensorRT can be used to speed up LLM inference. I tried to find other tools can be do such things similar and will compare them. Do you guys have any suggestions?

vLLM: speed up inference

TensorRT: speed up inference

DeepSpeed:speed up for training phrase

[–] DataLearnerAI@alien.top 1 points 10 months ago

In most scenarios, models with extended context are optimized for long sequences. If the sequence is not very long, it is often recommended to use a regular model