LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Starling-RM-7B-alpha: New RLAIF Finetuned 7b Model beats Openchat 3.5 and comes close to GPT-4 (alien.top)

submitted 2 years ago by Legcor@alien.top to c/localllama@poweruser.forum

49 comments fedilink hide all child comments

https://preview.redd.it/3krgd1sg2z2c1.png?width=800&format=png&auto=webp&s=b76c5fb9fa22938c74ec3095f63adaec8ff2219d

I came across this new finetuned model based on Openchat 3.5 which is apparently trained used Reinforcement Learning from AI Feedback (RLAIF).

https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha

Check out this tweet: https://twitter.com/bindureddy/status/1729253715549602071

you are viewing a single comment's thread
view the rest of the comments

[–] ex-arman68@alien.top 1 points 2 years ago

Here is some info I posted for the 11b version of this model, but it is probably useful for the original 7B version as well.

I think I found the key to avoid repetitions and long rambling answers, which this model has a tendency to do. Hopefully a further fine tune will reduce it. The key is to reduce creativity all the way down, and make the model deterministic. How do you do that?, you may ask. Easy, it is controlled by the following 3 inference parameters: temp, top_p, and top_k

With the following default settings I often get repetitions or additional rambling information:

    "top_k": 40,
    "top_p": 0.95,
    "temp": 0.8,

If I use the following values instead, to make the model deterministic, the problem seems to be gone:

    "top_k": 1,
    "top_p": 0.1,
    "temp": 0.1,

Please note that if you want to use the model for story writing, maybe you get better results by dialing up the creativity.

Here is my complete config file for LM Studio:

{
  "name": "OpenChat",
  "inference_params": {
    "top_k": 1,
    "top_p": 0.1,
    "temp": 0.1,
    "input_prefix": "GPT4 Correct User: ",
    "input_suffix": "&lt;|end_of_turn|>GPT4 Correct Assistant:",
    "antiprompt": [
      "GPT4",
      "&lt;|end_of_turn|>",
      "[End of Turn]",
      "[]"
    ],
    "pre_prompt": "Below is an instruction that describes a task. Write a concise response that appropriately completes the request. Ensure all essential details are provided. Each of your statements must be unique.",
    "pre_prompt_suffix": "&lt;|end_of_turn|>",
    "pre_prompt_prefix": "GPT4 System: "
  }
}

A few words about the above:

I only include necessary options to avoid overwriting user settings when loading the model or switching prompt format. If you export a config file, please make sure you then edit it manually to clean it up.
GPT Correct User/Assistant. The Correct keyword is important. It refers to the training data, where the answers were verified as correct. If you do not use it (eg: GPT4 User or Human User), it will still works, but it will give more weight to training data which was unverified.
GPT4 Sytem or just System are the 2 official recommended ways to prefix system messages. Either work.
In my system message (pre_promt), I avoid any negative (eg: I do not instruct : Do not repeat yourself"). Remember this is just a language model: if it sees the word "repeat", it will have a tendency to see this as an instruction to create repetitions! Instead I turned it around into a positive statement based on the word "unique".

As a bonus, here is my config for generating code, which according to my limited testing, this model seems to be surprisingly good at:

{
  "name": "OpenChat Code",
  "inference_params": {
    "top_k": 1,
    "top_p": 0.1,
    "temp": 0.1,
    "input_prefix": "Code User: ",
    "input_suffix": "&lt;|end_of_turn|>Code Assistant:",
    "antiprompt": [
      "GPT4",
      "&lt;|end_of_turn|>",
      "[End of Turn]",
      "[]"
    ],
    "pre_prompt": "You are a helpful coding assistant. Respond concisely, but ensure all essential details are provided. Each of your statements must be unique.",
    "pre_prompt_suffix": "&lt;|end_of_turn|>",
    "pre_prompt_prefix": "GPT4 System: "
  }
}