this post was submitted on 15 Nov 2023
1 points (100.0% liked)
Machine Learning
1 readers
1 users here now
Community Rules:
- Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
- Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
- Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
- Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
No, everyone is crazy and not thinking at all right now. Vector databases are a great example of cargo culting, as are many other approaches in AI and ML.
I increasingly work with the embedding vectors, but I keep them in memory or in a regular database column. By keeping them in a regular database you can tag ordinary records with locations within embedding spaces, and you gain all kinds of helpful clustering and joining capabilities through embeddings tuned to specific tasks. You just loop over the hydrated records. You get all the same benefits and more.
I agree on the over hype. I think you can get most of the features you are talking about through metadata tagging in vector dbs. So at that point it becomes a question of which is more affordable/quicker and I guess we don't definitively know.
But also to your point, some vector dbs have top k similar caps so a db with records above these caps wouldn't return all records like a sql where query.
In terms of semantic search you are pretty much running the same process unless you are implement some custom distance metric which is doable in most vector dbs.
So, you are totally correct on the cargo culting thing but there could be a benefit if it is faster/cheaper or tremendous downside if it is slower/more expensive. I guess we will never know.
But functionality is the same if you choose the right vec db or a relational db
** Edit **
If I am wrong, call me an idiot and let me know where i am wrong