Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 1 year ago

MODERATORS

communick@academy.garden

[P] image classification for product images (alien.top)

submitted 1 year ago by EyeTechnical7643@alien.top to c/machinelearning@academy.garden

5 comments fedilink hide all child comments

Say you are a potato chips company. The goal is to have consumers upload images of the product they are having issues with and be able to identify the product by brand/variant using machine learning. Consumers can upload real product photos that they have taken, or upload bogus images from the internet, or even upload completely irrelevant/inappropriate photos (like that of a dog or cat).

In this example, for the legitimate image, the goal is to classify it as "Lays Classic". There might be products that are not in bag form, such as those in tubes. Furthermore, the images taken can be in different lighting conditions/orientations. Some images might have other products as well.

I have been out of the ML field for the past 4 years so I'm not up to date on the most state of the art methods for this problem. I have studied CNNs 4 years ago, but there has been advances like transformer based methods. Someone has tried ResNet-50 and YOLOv5, and I'm thinking about using a pretrained model like CLIP and just train the final classification layer.

But I would appreciate to hear from someone more well versed what recommended approach to take as far as model/labeling/number of images needed per class, etc. It might be that I would need multiple models, such as one to identify the legitimate images from the rest, and then another one to identify the product/variants.

Any advice would be welcome. Thanks

top 5 comments

sorted by: hot top controversial new old

[–] londons_explorer@alien.top 1 points 1 year ago (1 children)

using a pretrained model like CLIP and just train the final

Yes, do this. Should be quick and easy, and you should get decent results with only ~10 examples per class.

[–] EyeTechnical7643@alien.top 1 points 1 year ago (1 children)

Thanks.

Once you get the embeddings from the pretrained model, what classification method should one use for the final classification? Random forest? SVM?

I will also look into the average method you mentioned. Are you saying taking the averages of the embeddings for each class, and then to classify an embedding, see which class average is closest to the embedding (by closest you mean something like the L2 norm)?

It's encouraging that one can do this in a day, but I haven't done any ML work for a few years. Should I use Pytorch or Tensorflow?

Thanks

[–] londons_explorer@alien.top 1 points 1 year ago

Use pytorch - tensorflow is pretty much dead.

I'd also use google colab (the free version is fine).

Start from someone elses colab that already uses the pretrained models you need, and then nearly everything is already set up for you, and you won't spend a day wrestling with GPU drivers.

closest

L2 norm is fine, yes, although you might get better results with cosine similarity.

If I were you, I would start here:

https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_similarity.ipynb

Hit the little play buttons to edit and run the code yourself.

[–] londons_explorer@alien.top 1 points 1 year ago

For detecting "bogus image from the internet", just search your own database of submitted images. If you see the same image (either by md5 or by very similar embedding vector), then it's probably one from the internet. Before long, you'll have collected all the ones users can easily find from Google Images etc. (I assume this is to prevent fraud where a user says "look, my packet was damaged", when in fact they just searched for a picture of a damaged pack online)

[–] colefinbar1@alien.top 1 points 1 year ago

Lionvaplus could be a good option to generate additional training data through its photorealistic image creation. This could help improve model accuracy, especially for classes with limited real-world image samples.
For basic classification between legitimate product images and bogus/irrelevant images, Lionvaplus may not be necessary. A pretrained model like ResNet or efficientnet fine-tuned on your data should work decently.