I'm really interested in having a 51B model. I would love something between 34B and 65/70B.
this post was submitted on 29 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
So I don't know much about architecture but I'm assuming if we want to run something like this in Llama, we're going to need to submit a request? If its ground up, then pretty much everything is going to need to be implemented, right?
- Deepseek 67B still beats XVERSE-65B in the benchmarking scores.
- The benchmarks indicate strong math and coding performance for these two model series.
- Yuan has a unique optional attention mechanism that enhances output quality