vasileer

joined 1 year ago
[–] vasileer@alien.top 1 points 11 months ago (1 children)

did you test the model before advertising it?

[–] vasileer@alien.top 1 points 11 months ago

"A 34B model beating all 70Bs and achieving the same perfect scores as GPT-4 and Goliath 120B in this series of tests!"

https://www.reddit.com/r/LocalLLaMA/comments/17vcr9d/llm_comparisontest_2x_34b_yi_dolphin_nous/

from a link another commenter posted

[–] vasileer@alien.top 1 points 11 months ago

works for me with the latest llama.cpp on Windows (CPU only, AVX)

command

`main -m ../models/deepseek-coder-6.7b-instruct.Q4_K_M.gguf -p "### Instruction\n:write Snake game in python\n### Response:" -n 2048 -e`

result

https://preview.redd.it/k0poo4o1171c1.png?width=978&format=png&auto=webp&s=3bf1fc497ed66da28742af4d53972c5e15928390

[–] vasileer@alien.top 1 points 11 months ago (1 children)

3 ideas

  1. quantization

fastchat-t5 is a 3B model on bfloat16, that means it needs at least at least 3B x 16bits ~ 6GB RAM only for the model itself, and 2K tokens limit for the context (for both prompt and answer),

a quick way to speed up is to use a quantized version:

8bit quant, with almost no quality lost, like https://huggingface.co/limcheekin/fastchat-t5-3b-ct2,

you will get a 2x smaller file and 2x faster inference,

but better read #2 :)

  1. a better model/finetune for better quality

a Mistral finetune like https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF, wich is 7B, quantized to 4bits, will have ~ the same size as 8bit fastchat-t5,

but a superior performance as it was most probably trained on more tokens than llama2 (~2T tokens), and flan-t5 (base model of the fastchat-t5) was only on 1T,

explanation why a larger model quantized is better than a smaller one even not quantized is explained here https://github.com/ggerganov/llama.cpp/pull/1684

  1. use HuggingFace as a hosting, it is ~20$/month for the same server you mentioned that costs 160$, so it is 8x cheaper

https://preview.redd.it/54x2ff87gk0c1.png?width=839&format=png&auto=webp&s=dae1d27376c9c858935c285dd765246af79a86a4

[–] vasileer@alien.top 0 points 11 months ago (4 children)

is this a scam or what? none of the models above are from NurtureAI:

- zephyr-beta is trained by HuggingFace and is 32K by default

- neural-chat is from Intel

- synthia is from migtissera

Original links:

https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

https://huggingface.co/Intel/neural-chat-7b-v3-1

https://huggingface.co/migtissera/SynthIA-7B-v2.0

[–] vasileer@alien.top 1 points 1 year ago (1 children)

200K context!!

[–] vasileer@alien.top 1 points 1 year ago

on quality: if you go with a smaller model or even another model you will lose quality, as Mistral (and his finetunes) is the best among <70B models and another rule of thumb is that a bigger model quantized (even 2bits) is better than a smaller unquantized,

on speed: the fastest inference is from Q4_K_S https://github.com/ggerganov/llama.cpp/pull/1684

[–] vasileer@alien.top 1 points 1 year ago

I tested the 3B model for Romanian, Russian, French, and German translations of the "The sun rises in the East and sets in the West." and it works 100%: it gets 10/10 from ChatGPT

[–] vasileer@alien.top 1 points 1 year ago

I think it really depends on the finetune, for example, Mistral-Instruct is able to summarize or extract information from a 32K context, for writing, you will have to find a finetuned model for that task

[–] vasileer@alien.top 1 points 1 year ago

Mistral and Llama2 work with many languages even if are marked as English.

Here is a quote from a benchmark on the German language, I think you will get a similar conclusion if you will do it for Portuguese.

"Kinda ironic that the English models worked better with the German data and exam than the ones finetuned in German. Looks like language doesn't matter as much as general intelligence and a more intelligent model can cope with different languages more easily. German-specific models need better tuning to compete in general and excel in German."

https://www.reddit.com/r/LocalLLaMA/comments/178nf6i/mistral_llm_comparisontest_instruct_openorca/

view more: ‹ prev next ›