overview for DataLearnerAI

Why is no one releasing 70b models? in c/localllama@poweruser.forum

[–] DataLearnerAI@alien.top 1 points 9 months ago

Ali opensouced a 72B model called Qwen-72B: Qwen/Qwen-72B · Hugging Face

It supports Chinese and English. The performance on MMLU is remarkable.

1

is there any other tools like vLLM or TensorRT that can be used to speed up LLM inference? (alien.top)

submitted 10 months ago by DataLearnerAI@alien.top to c/localllama@poweruser.forum

5 comments fedilink

I know that vLLM and TensorRT can be used to speed up LLM inference. I tried to find other tools can be do such things similar and will compare them. Do you guys have any suggestions?

vLLM: speed up inference

TensorRT: speed up inference

DeepSpeed:speed up for training phrase

Yi-34B vs Yi-34B-200K on sequences <32K and <4K in c/localllama@poweruser.forum

[–] DataLearnerAI@alien.top 1 points 10 months ago

In most scenarios, models with extended context are optimized for long sequences. If the sequence is not very long, it is often recommended to use a regular model