LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Safety checks in Llama 2 (alien.top)

submitted 2 years ago by Little-Name9809@alien.top to c/localllama@poweruser.forum

4 comments fedilink hide all child comments

Recently came across this AI Safety test report from LinkedIn: https://airtable.com/app8zluNDCNogk4Ld/shrYRW3r0gL4DgMuW/tblpLubmd8cFsbmp5

From this report it seems Llama 2 (7B version?) lacks some safety checks compared to OpenAI models. Same with Mistral. Did anyone find the same result? Has it been a concern for you?

top 4 comments

sorted by: hot top controversial new old

[–] phree_radical@alien.top 1 points 2 years ago (1 children)

It's comparing base models (which are not trained to follow or refuse instructions) against instruction-tuned ones (OpenAI)

[–] CookieCat171@alien.top 1 points 2 years ago (1 children)

afety checks in Llama 2

it seems it's comparing chat models: https://airtable.com/app8zluNDCNogk4Ld/shrYRW3r0gL4DgMuW/tblpLubmd8cFsbmp5

[–] phree_radical@alien.top 1 points 2 years ago

Looks like you've now made some changes. Columns now read "Llama2-7b-chat" instead of "llama2." Also, chat responses below the completions, chastising the inappropriate messages. However, a completion was generated, first, and the item is still marked as "fail." Very poor show

[–] AutomataManifold@alien.top 1 points 2 years ago

It's not clear if this is testing the chat model or the base model. Assuming it is the base model, it isn't surprising: it's just a text completion model with no extra frills. The point of the safety alignment training is that it's part of the instruct dataset and training, not the base model.

This is what you want, even if you're concerned about safety. You don't want the safety to be baked in to the raw completion model: if some future better way comes along to do safety training, you want to be able to use it without retraining the entire model from scratch. (And given the speed at which this stuff moves, that might be just a week from now.)

Of course, if you're concerned about safety you shouldn't be deploying the raw text completion model to end users. (For a whole host of reasons, not just safety.)