Cotega

joined 10 months ago

Local Rag/embedding clarifications in c/localllama@poweruser.forum

[–] Cotega@alien.top 1 points 9 months ago

It might help to think of RAG as multiple steps (Retrieve, Augment, Generate), all of which you can debug / look at, to see where it might be failing.

What I would do is look first at the retrieval stage. This is where you are executing a Vector (or Hybrid or whatever) search against your vector store and retrieve a set of documents that match your query. Keep in mind, in Retrieve, you are not sending the vectorized prompt, but more likely the question the user is asking. Take a look at what is coming back and make sure they seem correct. If not, there is probably something wrong here to look at. BTW, I personally prefer to start with 500 tokens with around 50 tokens of overlap between chunks, but that can vary greatly on models, content, etc.

If that works, I would then look at the "Augment" part which is where you are injecting the results from the Retrieval stage into your prompt. Does it look correct? I doubt this is where the issue is, but worth a look.

Finally take a look at what comes in the "Generate" stage when you pass this augmented prompt. Does it look different from what you saw previously?