AdamDhahabi

joined 10 months ago
[–] AdamDhahabi@alien.top 1 points 9 months ago

Unethical practices, one-man-shops attempting to pump up the account value artificially, aiming for a sale later on.

[–] AdamDhahabi@alien.top 1 points 10 months ago

Can be done with a Mattermost self-hosted server https://github.com/mattermost/openops

[–] AdamDhahabi@alien.top 1 points 10 months ago (1 children)

Interested to know how it scores for RAG use cases, there is a benchmark for that https://github.com/vectara/hallucination-leaderboard

Up to now, Mistral underperforms Llama2.

[–] AdamDhahabi@alien.top 1 points 10 months ago

Llama.cpp supports batched inference since 4 weeks https://github.com/ggerganov/llama.cpp/issues/2813

-cb, --cont-batching enable continuous batching (a.k.a dynamic batching) (default: disabled)

 

Vectara's Hallucination Evaluation Model and leaderboard was launched last week. I notice Mistral having a hallucination rate of 9.4% compared to 5.6% for Llama2. Any thoughts?

https://preview.redd.it/sj0akn15tszb1.png?width=1118&format=png&auto=webp&s=ca9ec766f592a8748bf95a8ad2ef81483c2270bd

Source: https://github.com/vectara/hallucination-leaderboard

[–] AdamDhahabi@alien.top 1 points 10 months ago

Yesterday I tried GPT4All, and it references context by outputting 3 passages from my local documents. I could click on each of them and read the passage. But their implementation is only using some algorithm at the moment. Embedding based on semantic search is still on their roadmap.

https://preview.redd.it/dnoqmk4olazb1.png?width=1807&format=png&auto=webp&s=cdd1f17a2ea20100504c275094e52b61a6e054f7

[–] AdamDhahabi@alien.top 1 points 10 months ago (2 children)

I think batched inference is a must for companies who want to put an on-premise chatbot in front of their users. This is a use case many are busy with at the moment. I saw llama.cpp now supports batched inference, only since 2 weeks, I don't have hands-on experience with it yet.