As I’m sure a lot of people suspected, the guys over at Microsoft’s multimedia search team have been working on image comparison, i.e. using an image to find other images. I can’t seem to get it to appear though. Has anybody seen it in action?


  1. Andrew Stromberg says:

    Shows up for me, just scroll over the image and it has a link that says “show similar images”. e.g.:

    I dunno how they define “similar” though it doesn’t look like shape or color similarity, probably a tagging similarity.

    • Richard Marr says:

      As for their similarity measure, I’d be surprised if they haven’t tried out various methods of finding similarity. Most measures of similarity have situations where they perform really badly, so creating a general-purpose similarity measure is hard. The bigger players have more to lose, so they’re proportionally more averse to the risk of releasing something with flaws.

  2. Richard Marr says:

    It must be a partial release of some kind then. When I roll over the image the only link I get is “feedback on this image”. Thanks Andrew.

  3. Andrew Stromberg says:

    Yeah, it makes sense for them to not try something too crazy off the bat, large sites tend to be very conservative in rolling out changes and making even those very incremental to measure their affect. I would imagine that there are other issues you run into when you are trying to search millions of images for shape or color. I picked up on your blog after working with LIRE for a bit and it gets slow (partially due to my machine) when you search tens or hundreds of thousands of images.

  4. Richard Marr says:

    Well I hope it’s not too much of a disappointment. I always feel like I have to hold back from talking about anything interesting that we’re working on for fear of giving secrets away.

    LIRE is a really cool tool, but it suffers from the same scalability problems that most implementations do. There are techniques that can help but I can’t really go into that in public yet.

    Maybe if I get the OK from those on high we can contribute some of those techniques to LIRE or Lucene at some point. Knowledge spreads though, so I’m sure it’ll become public eventually.

    What kind of things are you using LIRE for?

  5. Andrew Stromberg says:

    Yeah, I figure with Pixsta you probably can’t say too much =P I wanted to learn more about image recognition and processing so I started making a clothing recommendation site. I didn’t get much further than an experimental mockup though (nothing live). I still work on the stuff a bit in my spare time though.

    Yeah, when I first started with Lire I was hoping that it was actually using Lucene to do some sort of actual indexing on the descriptors, but when I saw it was just using it for storage I was a bit disappointed.

    For color searching, I was able to reduce the search space by storing information about the primary colors of the image, essentially making an index saying “only calculate distances for images with primary colors A,B,C”. For shape searches I don’t understand enough of the theory behind it to form any sort of meaningful index to keep the number of comparisons down.

    The rough color index thing I made reduced the number of comparisons to about 1/5 of the original (depending on the density of the images in the color space), but I should check to see how it works on larger sets (the one I did was 10k images).

    Huh, that ended up being a lot longer than I thought =P I am guessing there is some special sauce that you can add to Lucene to make it index image descriptors though.

  6. Richard Marr says:

    No, I don’t fully understand the decision to use Lucene just for storage either. The strongest features of Lucene are discarded if you’re using it to store vector strings rather than using the indexing.

