using a pretrained model like CLIP and just train the final
Yes, do this. Should be quick and easy, and you should get decent results with only ~10 examples per class.
using a pretrained model like CLIP and just train the final
Yes, do this. Should be quick and easy, and you should get decent results with only ~10 examples per class.
Thanks.
Once you get the embeddings from the pretrained model, what classification method should one use for the final classification? Random forest? SVM?
I will also look into the average method you mentioned. Are you saying taking the averages of the embeddings for each class, and then to classify an embedding, see which class average is closest to the embedding (by closest you mean something like the L2 norm)?
It's encouraging that one can do this in a day, but I haven't done any ML work for a few years. Should I use Pytorch or Tensorflow?
Thanks
Use pytorch - tensorflow is pretty much dead.
I'd also use google colab (the free version is fine).
Start from someone elses colab that already uses the pretrained models you need, and then nearly everything is already set up for you, and you won't spend a day wrestling with GPU drivers.
closest
L2 norm is fine, yes, although you might get better results with cosine similarity.
If I were you, I would start here:
Hit the little play buttons to edit and run the code yourself.
For detecting "bogus image from the internet", just search your own database of submitted images. If you see the same image (either by md5 or by very similar embedding vector), then it's probably one from the internet. Before long, you'll have collected all the ones users can easily find from Google Images etc. (I assume this is to prevent fraud where a user says "look, my packet was damaged", when in fact they just searched for a picture of a damaged pack online)
Lionvaplus could be a good option to generate additional training data through its photorealistic image creation. This could help improve model accuracy, especially for classes with limited real-world image samples.
For basic classification between legitimate product images and bogus/irrelevant images, Lionvaplus may not be necessary. A pretrained model like ResNet or efficientnet fine-tuned on your data should work decently.