Recognising specific products in images

Yesterday Andrew Stromberg pointed me to the excellent IPhone app by image-matching outfit Snaptell.

Snaptell’s application takes an input image (of an album, DVD, or book) supplied by the user and identifies that product, linking to 3rd party services. This is equivalent to the impressive TinEye Music but with a broader scope. As Andrew points out, the app performs very well at recognising these products.

Algorithmically the main problems faced by someone designing a system to do this are occlusions (e.g. someone covering a DVD cover with their thumb as they hold it) and transformations (e.g. skewed camera angle, or a product that’s rotated in the frame)

There are a number of techniques to solve these problems, (e.g. the SIFT and SURF algorithms) most of which involve using repeatable methods to find key points or patterns within images, and then encoding those features in such a way that is invariant to rotation (i.e. will still match when upside-down) and an acceptable level of distortion. At query-time the search algorithm can then find the images with the most relevant clusters of matching keypoints.

It seems like Snaptell have mastered a version of these techniques. When I tested the app’s behaviour (using my copy of Lucene in Action) I chose an awkward camera angle and obscured around a third of the cover with my hand and it still worked perfectly. Well done Snaptell.

Advertisements

3 Responses to Recognising specific products in images

  1. Kris says:

    Snaptell is interesting, it looks like they’ve scaled up to about 5 million product images (5,426,775 to be exact, here’s the last image in their DB: http://apps.snaptell.com:8080/catalog/item/show_image/5426775 ), so they may still face some scalability issues that TinEye has already overcome.

    From a computer vision point of view, they’ve dealt with occlusion/rotation/etc well, but my hunch is that they’re using a fingerprint technique to do the lookup (as does Tineye). Fingerprint methods leave a lot to be desired, if users want to “take a picture of a shoe,” fingerprint methods aren’t great at this. Fingerprints can find a logo reliably, but can they recognize the eiffel tower late at night on a december evening? Not really.

  2. Andrew Stromberg says:

    More stuff to learn! Now I get to read up about SIFT and SURF, try to understand why they work and then make wild speculation as to how one could index the images for faster search 😉

    Just out of curiosity, do you have any particular books or reading that you would recommend to a person starting off in image search? I’ve been learning haphazardly through online sources and walking through code of particular algorithms, but I feel like I am missing out on a lot of theoretical background that would probably help me understand why some of these algorithms work.

  3. Richard Marr says:

    @Kris, thanks for the insights. I think from a computer vision perspective there’s always temptation to keep hunting for algorithms that solve all problems, but from a business perspective what they’ve got is probably enough for them. From here I think their main problems are ones to do with application development and making the product slick and scalable. Maybe they’ve not advanced the state of the art, but they’ve got a shot at a good business.

    @Andrew, I’m not aware of any books that cover all the issues involved. Most either cover generic image processing or generic indexing. From our previous chat I’d guess you already had a fair grasp on how to extract data from images, so probably the indexing is where to focus. There are two main directions you can take with indexing. You can either take the discrete road and index discrete properties of an image, or you can go the other way and build a multi-dimentional vector space based on vectors extracted from images. I reckon probably get yourself a copy of Lucene in Action and work through some examples. Then see if you can figure out to make LIRE sing, because as you pointed out before LIRE is strong on processing and weak on indexing. Lucene can’t solve all visual indexing problems, but it’s a damned good piece of software and is somewhere to start.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: