this post was submitted on 15 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

I have been working on a presentation that sums up the main points why vector databases are often an unnecessary optimization that's too often being promoted by vector database vendors. The slides are available here: https://vec3.ai/

Do you think vector databases are overrated?
In what instances have vector databases proved most useful in your projects, and were they commercial implementations?

you are viewing a single comment's thread
view the rest of the comments
[โ€“] waffleseggs@alien.top 1 points 11 months ago (1 children)

No, everyone is crazy and not thinking at all right now. Vector databases are a great example of cargo culting, as are many other approaches in AI and ML.

I increasingly work with the embedding vectors, but I keep them in memory or in a regular database column. By keeping them in a regular database you can tag ordinary records with locations within embedding spaces, and you gain all kinds of helpful clustering and joining capabilities through embeddings tuned to specific tasks. You just loop over the hydrated records. You get all the same benefits and more.

[โ€“] Far_Ambassador_6495@alien.top 1 points 11 months ago

I agree on the over hype. I think you can get most of the features you are talking about through metadata tagging in vector dbs. So at that point it becomes a question of which is more affordable/quicker and I guess we don't definitively know.

But also to your point, some vector dbs have top k similar caps so a db with records above these caps wouldn't return all records like a sql where query.

In terms of semantic search you are pretty much running the same process unless you are implement some custom distance metric which is doable in most vector dbs.

So, you are totally correct on the cargo culting thing but there could be a benefit if it is faster/cheaper or tremendous downside if it is slower/more expensive. I guess we will never know.

But functionality is the same if you choose the right vec db or a relational db

** Edit **
If I am wrong, call me an idiot and let me know where i am wrong