overview for AdamDhahabi

Please enlighten me, why are people building LLM Twitter bots? in c/localllama@poweruser.forum

[–] AdamDhahabi@alien.top 1 points 9 months ago

Unethical practices, one-man-shops attempting to pump up the account value artificially, aiming for a sale later on.

Best way to share LLM's as a team? in c/localllama@poweruser.forum

[–] AdamDhahabi@alien.top 1 points 10 months ago

Can be done with a Mattermost self-hosted server https://github.com/mattermost/openops

Neural-chat-7b-v3-1 GGUF. New Mistral finetune in c/localllama@poweruser.forum

[–] AdamDhahabi@alien.top 1 points 10 months ago (1 children)

Interested to know how it scores for RAG use cases, there is a benchmark for that https://github.com/vectara/hallucination-leaderboard

Up to now, Mistral underperforms Llama2.

Anyone Hosting Llama Models in Production? Seeking Insights on Scaling and Resource Optimization in c/localllama@poweruser.forum

[–] AdamDhahabi@alien.top 1 points 10 months ago

FYI, discussed here 11 days ago https://www.reddit.com/r/LocalLLaMA/comments/17m2lql/best_framework_for_llm_based_applications_in/

Anyone Hosting Llama Models in Production? Seeking Insights on Scaling and Resource Optimization in c/localllama@poweruser.forum

[–] AdamDhahabi@alien.top 1 points 10 months ago

Llama.cpp supports batched inference since 4 weeks https://github.com/ggerganov/llama.cpp/issues/2813

-cb, --cont-batching enable continuous batching (a.k.a dynamic batching) (default: disabled)

1

RAG - Vectara's Hallucination leaderboard (alien.top)

submitted 10 months ago by AdamDhahabi@alien.top to c/localllama@poweruser.forum

5 comments fedilink

Vectara's Hallucination Evaluation Model and leaderboard was launched last week. I notice Mistral having a hallucination rate of 9.4% compared to 5.6% for Llama2. Any thoughts?

https://preview.redd.it/sj0akn15tszb1.png?width=1118&format=png&auto=webp&s=ca9ec766f592a8748bf95a8ad2ef81483c2270bd

Source: https://github.com/vectara/hallucination-leaderboard

Chunking and storing structured data and vectors for RAG in c/localllama@poweruser.forum

[–] AdamDhahabi@alien.top 1 points 10 months ago

Yesterday I tried GPT4All, and it references context by outputting 3 passages from my local documents. I could click on each of them and read the passage. But their implementation is only using some algorithm at the moment. Embedding based on semantic search is still on their roadmap.

https://preview.redd.it/dnoqmk4olazb1.png?width=1807&format=png&auto=webp&s=cdd1f17a2ea20100504c275094e52b61a6e054f7

on-demand inference or batch inference? in c/localllama@poweruser.forum

[–] AdamDhahabi@alien.top 1 points 10 months ago (2 children)

I think batched inference is a must for companies who want to put an on-premise chatbot in front of their users. This is a use case many are busy with at the moment. I saw llama.cpp now supports batched inference, only since 2 weeks, I don't have hands-on experience with it yet.