this post was submitted on 12 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Vectara's Hallucination Evaluation Model and leaderboard was launched last week. I notice Mistral having a hallucination rate of 9.4% compared to 5.6% for Llama2. Any thoughts?

https://preview.redd.it/sj0akn15tszb1.png?width=1118&format=png&auto=webp&s=ca9ec766f592a8748bf95a8ad2ef81483c2270bd

Source: https://github.com/vectara/hallucination-leaderboard

top 5 comments
sorted by: hot top controversial new old
[–] Wonderful_Ad_5134@alien.top 1 points 10 months ago

"llama2 7b > llama2 13b"

lol

[–] LoSboccacc@alien.top 1 points 10 months ago

Oof 3% is a lot

[–] FullOf_Bad_Ideas@alien.top 1 points 10 months ago (1 children)

I don't think they actually tested base models. Look at the description of their methods - they don't run the models themselves, they only use public apis They say they used mistral-instruct, not Mistral. Those are not the same models, you shouldn't put "Mistral" in the table if you ran tests on "Mistral-Instruct". There is no information what actual model was used for llama test, or the output of the test. I suspect that they used llama-2-chat models which were RHLFed. Mistral Instruct is not RHLFed. It's likely that RHLF can reduce hallucination rate and we are seeing it's effects.

[–] aaronr_90@alien.top 1 points 10 months ago

Noob question: What is the recommended method to interact with a non finetuned/chat model?

[–] Distinct-Target7503@alien.top 1 points 10 months ago

How is possible that Llama2 13B and 7B have lower hallucination rate than Claude?