LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Rag vs Vector db (alien.top)

submitted 2 years ago by troposfer@alien.top to c/localllama@poweruser.forum

9 comments fedilink hide all child comments

I am confused about these 2 . Sometimes people use it interchangeably. Is it because rag is a method and where u store it should be vector db ? I remember before llms there was word2vec in the beginning ,before all of this llm. But isn’t the hard part to create such a meaningful word2vec , by the way word2vec is now called “embeddings” right?

you are viewing a single comment's thread
view the rest of the comments

[–] Life_Inspection4454@alien.top 1 points 2 years ago (1 children)

This is a very good answer, but I'll try to elaborate to make things clearer:

RAG is done by:

Taking a long text and splitting it into chunks of a certain size/length.
You take each chunk of text, and run it through a function which turns the text into a vector representation. This vector representation is called an embedding, and the function used is an embedding function/model. E.g. OpenAIEmbeddings(). You then generally store these vectors in a vector database (Qdrant, Weviate ++).
When someone asks a question, create an embedding for the question.
Since your question is a vector (embedding), and your data is represented as vectors (embeddings) in your vector db (from 2), you can then compare your question vector with your data vectors. Technically you measure distance between your question vector to vectors in your vector db. Vectors closer to your questions, is likely to contain data relevant to your question.
You grab the text corresponding to the (e.g.) 3 closest vectors from your vector db. The text is often stored along with the vector for retrieval purposes. You send that text + question to your LLM (e.g. GPT-4), and implicitly say: "Answer this question based on only these 3 chunks of text." That way you sort of limit the language models knowledge to what you explicitly give it.

[–] troposfer@alien.top 1 points 2 years ago

Oh thanks in 5. You also answered 1 question in my mind, how to return back to words from floatin point numbers. Then now i understand they are created by specific embedding creator models. And I guess every result is different then other models result. So isn’t this so important like best embedding creator model and query creator model, which one is more successful right now now? And if i create an embedding in one creator model , i can’t create an embedding query with different embedding creator model to query my embedding?