additional information about the products than run at a specific time (like product family, color, brand, etc.).
Seems easier to just search for images based on the additional information and then train on those images. You would have to first predict the brand(which again requires you to extract visual features from product images of that brand) and maybe do ocr(slow) and use that to filter region proposal.