Hi,
I've issues when Q&A my PDF:
- - I don't have consistency
- - Tried many models (embedding/llm)
- - Tried many methods
- - I have 20% hallucination, especialy with "president" and "mickael jackson"
ex1 (this one is correct):
> question: How much net income for Amazon in 2020, 2021 and 2022 ?
> answer: In 2020, Amazon made a net income of $21.331 billion, in 2021 $33.364 billion and in 2022 the company made a loss of $2.722 billion.
ex2: (this one is incorrect):
> question: How much operating expenses for AWS in 2020, 2021 and 2022 ?
> answer: The operating expenses of AWS in 2020, 2021, and 2022 were $444.943 billion, $501.735 billion, and $567.608 billion respectively.
It always calculate the entire expenses of the company, i try GPT-4 and it is capable.
- PDF: Amazon 2022 annual report 10K (88 pages)
- Embedding: all-MiniLM-L12-v2
- Text splitter: Chunk_size = 1000, overlap = 20
- VectorDB: Chroma
- LLM: SciPhi-Self-RAG-Mistral-7B-32k-8.0bpw-h6-exl2 via Oobabooga (OpenAI extension) with 0.2temp, alpaca instruction template.
- Langchain: RetrievalQA, chain_type = stuff, retriever = vectDB.as_retriever()
- RTX 3090
If anyone resolve this issue, please can you help me :)
Some updates:
I got a strong 90% of success with the PDF, will send the code when this will be cleaned and optimized, thank you all for the help ๐
Can you please elaborate.
You can find the method here :
https://medium.aiplanet.com/advanced-rag-providing-broader-context-to-llms-using-parentdocumentretriever-cc627762305a