LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

RAG - Vectara's Hallucination leaderboard (alien.top)

submitted 1 year ago by AdamDhahabi@alien.top to c/localllama@poweruser.forum

5 comments fedilink hide all child comments

Vectara's Hallucination Evaluation Model and leaderboard was launched last week. I notice Mistral having a hallucination rate of 9.4% compared to 5.6% for Llama2. Any thoughts?

https://preview.redd.it/sj0akn15tszb1.png?width=1118&format=png&auto=webp&s=ca9ec766f592a8748bf95a8ad2ef81483c2270bd

Source: https://github.com/vectara/hallucination-leaderboard

you are viewing a single comment's thread
view the rest of the comments

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago (1 children)

I don't think they actually tested base models. Look at the description of their methods - they don't run the models themselves, they only use public apis They say they used mistral-instruct, not Mistral. Those are not the same models, you shouldn't put "Mistral" in the table if you ran tests on "Mistral-Instruct". There is no information what actual model was used for llama test, or the output of the test. I suspect that they used llama-2-chat models which were RHLFed. Mistral Instruct is not RHLFed. It's likely that RHLF can reduce hallucination rate and we are seeing it's effects.

[–] aaronr_90@alien.top 1 points 1 year ago

Noob question: What is the recommended method to interact with a non finetuned/chat model?