hackerllama

joined 1 year ago
[–] hackerllama@alien.top 1 points 11 months ago (1 children)

Hey there! I think this is doing offloading?

If so, it's not a new thing. Check out https://huggingface.co/docs/accelerate/usage_guides/big_modeling for a guide with code and videos about it

[–] hackerllama@alien.top 1 points 11 months ago (4 children)

Base models are not trained for conversations, so you cannot use it as a chat. It's like GPT-4 and ChatGPT. GPT-4 is the base model, then it's fine-tuned to be conversational, which is what you see in ChatGPT. Same as Llama vs Chat Llama.

[–] hackerllama@alien.top 1 points 11 months ago

The chat model came out today

 

Yi is a series of LLMs trained from scratch at 01.AI. The models have the same architecture of Llama, making them compatible with all the llama-based ecosystems. Just in November, they released

  • Base 6B and 34B models
  • Models with extended context of up to 200k tokens
  • Today, the Chat models

With the release, they are also releasing 4-bit quantized by AWQ and 8-bit quantized by GPTQ

Things to consider:

  • Llama compatible format, so you can use across a bunch of tools
  • License is not commercial unfortunately, but you can request commercial use and they are quite responsive
  • 34B is an amazing model size for consumer GPUs
  • Yi-34B is at the top of the OS Leaderboard, making it a very strong base model for a chat one