Mammoth-Doughnut-160

joined 10 months ago
[–] Mammoth-Doughnut-160@alien.top 1 points 10 months ago

The section references without text associated with them are still a very hard problem to solve with RAG unfortunately and there is no magic bullet for that yet. The closest may be a knowledge graph but that presupposes the sections referenced show up with some frequency as well (in a large corpus a single link won’t be really visible). I have been looking at a lot of legal contracts and have similar issue.

The best solution still by far is RAG. Check out this GitHub repo that has the most easy to use integrated RAG with great hybrid searching and fact checking that is used a lot for legal documents: https://github.com/llmware-ai

[–] Mammoth-Doughnut-160@alien.top 1 points 10 months ago

Agree that you should look at RAG. LLMs are not search engines so you need to connect the knowledge corpus to LLMs.

Try LLMWare's RAG implementation - it is easy to use, straightforward, and automates Mongo and Milvus set up so great for what you are trying to achieve. LLMWare also has free models in Hugging Face you can start to experiment with for experimenting for your use case.

https://github.com/llmware-ai/llmware

https://huggingface.co/llmware

[–] Mammoth-Doughnut-160@alien.top 1 points 10 months ago (1 children)

Yes, semantic indexing and vector databases are now part of AI infra called Retrieval Augmented Generation which is used to link knowledge sources to LLMs for information retrieval. (LLMs are not good at searching). To learn more about how to implement RAG in a GenAI context, check out LLMWare which provides an integrated RAG platform so you can quickly level up in AI Infra: https://github.com/llmware-ai/llmware