LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Two sets of base models from China (Yuan 2.0-2B, 51B, 102B and XVERSE-7B, 13B, 65B) (alien.top)

submitted 2 years ago by Illustrious_Sand6784@alien.top to c/localllama@poweruser.forum

3 comments fedilink hide all child comments

Didn't see any posts about these models so I made one myself.

This first set of models was trained with 288B high quality tokens, will be interesting if the 51B and 102B models hold up. Commercial use is allowed with no authorization.

https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md

(Chinese) https://github.com/IEIT-Yuan/Yuan-2.0

Paper: https://arxiv.org/abs/2311.15786

Huggingface download links

https://huggingface.co/pandada8/Unofficial-Yuan-2.0-2B

https://huggingface.co/pandada8/Unofficial-Yuan-2.0-51B

https://huggingface.co/pandada8/Unofficial-Yuan-2.0-102B

Here's the second set of models I found. 7B and 65B were trained with 2.6T tokens, and the 13B with 3.2T. The 65B model supports up to 16K context, while the two smaller ones support up to 8K.

https://huggingface.co/xverse/XVERSE-65B

https://huggingface.co/xverse/XVERSE-13B

https://huggingface.co/xverse/XVERSE-7B

These models know 40 over human languages plus several programming languages too. Commercial use is allowed, but you have to submit an application form.

top 3 comments

sorted by: hot top controversial new old

[–] fallingdowndizzyvr@alien.top 1 points 2 years ago

I'm really interested in having a 51B model. I would love something between 34B and 65/70B.

[–] mrjackspade@alien.top 1 points 2 years ago

So I don't know much about architecture but I'm assuming if we want to run something like this in Llama, we're going to need to submit a request? If its ground up, then pretty much everything is going to need to be implemented, right?

[–] Aaaaaaaaaeeeee@alien.top 1 points 2 years ago

Deepseek 67B still beats XVERSE-65B in the benchmarking scores.
The benchmarks indicate strong math and coding performance for these two model series.
Yuan has a unique optional attention mechanism that enhances output quality