this post was submitted on 25 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Hello,

I'm looking for an alternative to Google Vision AI (LABEL_DETECTION, OBJECT_LOCALIZATION) and Amazon Rekognition (DetectLabels).
Any ideas?

Thanks!

you are viewing a single comment's thread
view the rest of the comments
[–] Specialist_Ice_5715@alien.top 1 points 10 months ago (2 children)

You'll have to go multi-modal. The best is now fuyu but that's not commercially usable.

[–] takezo07@alien.top 1 points 10 months ago (1 children)

I found Blip: https://replicate.com/salesforce/blip?input=form&output=preview
But that's not exactly what i'm looking for. It does image captioning very well.
Like in the their example: "a woman sitting on the beach with a dog".
But i need a list of objects and "things" like : dog, woman, beach, wave, shirt...etc.

[–] Specialist_Ice_5715@alien.top 1 points 10 months ago

interesting.. is blip commercially usable? I read that it is, but is this correct for the weights in their entirety?