this post was submitted on 14 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 11 months ago
MODERATORS
 

A common situation in IRL problems with long time horizons is the need to perform multiple very different subtasks. For example, imagine a model trained to remember a poem and then spell it out in blocks in a game of minecraft. The data for the poem itself and the appropriate minecraft functions probably have very different embeddings, but in practice it would probably be useful to ensure the memories for how to use minecraft functions are queried when that poem is queried.

It seems like just querying a RAG DB for the vectors with the highest cosine similarity won't be super useful for this task. A query for poems will just find poem-like data. But we don't just want to find things with similar embeddings to poems, we want to find data that is useful. Has there been any research into this time-series / associative type of RAG?

โ€‹

top 1 comments
sorted by: hot top controversial new old
[โ€“] saintshing@alien.top 1 points 10 months ago

I am not sure if I understand your question.

What exactly is your query? Is it "spell the poem in blocks"? Or you really want a "poem" query to always return also the part about spelling in minecraft blocks even though you haven't mentioned anything about minecraft blocks?

These two things are not associated together in human languages. I meant you can create training data to force them to be embedded together if you want. You can also add a layer on top of the vector db, so some metadata is stored together with the embedding which can help you retrieve related documents.