Thursday 5th November, 2009
It’s been a little while since the last Open Source Search Social, so we’re getting really imaginative and holding another one, this time on Wednesday the 18th of November. As usual the event is in the Pelican pub just off London’s face-bleedingly trendy Portobello Road.
The format is staying roughly the same. No agenda, no attitude, just some geeks talking about search and related topics in the presence of intoxicating substances.
Please come along if you can, just get in touch or sign up on the Upcoming page.
Leave a Comment » |
search | Tagged: beer, CBIR, events, image search, Lucene, OSS, search, social |
Permalink
Posted by Richard Marr
Thursday 30th July, 2009
Today I’m introducing my first ever guest post, written by Pixsta’s own Rohit Patange about some great work he’s been doing with the guidance of Tuncer Aysal. You’ll be able to see the results of their work shortly on our consumer-facing site Empora. – RM
–
We at Pixsta are interested in understanding what is in an image (recognise and extract) and do so in an automated way that involves a minimum amount of human input.
Our raw data (images and associated textual information) come from a variety of retailers with considerable variation in terms of data formats and quality. Some retailer images are squeaky clean with white backgrounds and a clear product depiction while others have multiple views of the product, very noisy backgrounds, models, mannequins and other such distracting objects. Since we only care about the product, an essential processing step involves identification of all image parts and the isolation of individual products, if several are present in the retailer image.
The n-shoe case:
Let’s take the case of retailer images with multiple product views. This is most commonly encountered in shoe images. Let us call each of the product views a ‘sub-image’.
When we talk about similar shoes we talk about a shoe being similar to the other (note the singular). We have to disregard how the shoe is presented in the image, the position of the sub-images, the orientation and other noise. If we do not do so, image matching technology tends to pick out images with similar presentation rather than similar shoes. Typically a retailer image (a shoe they are trying to sell) will have a pair of sub-images of shoes in different viewing angles. Pictorially with standard image matching we get the following results for a query image on the left:

Even though the image database contains images like:

These are not in the result set despite them being much closer matches, because of the presentation and varying number of sub-images. To overcome this drawback, we have to extract the sub-image which best represents the product for each of the images and then compare these sub-images. For the sub-image to be extracted, the image will need to go through the following processing steps:
- Determine which of the sub-images is the best represents the shoe.
- Extract that sub-image.
- Determine the shoe orientation in that sub-image.
- Standardise the image by rotation, flipping and scaling.
All the product images (shoes in this case) go through this process of standardisation, resulting in a uniform set of images. Pictorially the input and the output image of the standardisation process are:

Let’s look at the procedure in more detail assuming that the image has been segmented into background and foreground.
- The first step is to identify all the sub-images on the foreground. The foreground pixels of the images are labelled in such a way that different sub-images have different label to mark them as distinct.
- After the first iteration of labelling there is a high possibility that a sub-image is marked with 2 or more labels. Therefore all connected labels have to be merged.

- The third step is to determine which of the sub-images is of interest; that is picking the right label.

- Once the right sub-image is extracted the orientation of this sub-image is corrected to match a predefined standard to remove the differences in the terms of size of the product image, orientation (the direction the shoe is pointing towards) and the position of the shoe (sub-image) within the image.

All product images (shoes in this case) go through this process before the representative information from the image is extracted for comparison. Now the results for the query image will look like:

Generally there are two shoes in an image. But the method can be extended to ‘n’ shoes.
Leave a Comment » |
image search, technology | Tagged: CBIR, Empora, image search, MAST, Pixsta, similarity search |
Permalink
Posted by Richard Marr
Tuesday 21st April, 2009
Right in line with my too-obvious-to-be-worth-anything prediction, Google have just released a Labs image similarity feature for Google Images. Others have commented on this already, but obviously this is hugely interesting for me because of my currently work on Empora’s exploratory visual search so I’m going to throw my tuppence into the ring aswell.
Below are my first impressions.
Product impact
Google Similar Images (GSI) offers just one piece of functionality, the ability to find images that are similar to your selected image. You may only select images from their chosen set, there’s no dynamic image search capacity yet. Similar images are displayed either as a conventional result set when you click on “similar images”, or as a list of thumbnails in the header when you click through to see the original source.
The aims of this work will be (broadly):
- Keeping up with the Joneses. The other major search engines are working on similar functionality and Google can’t be seen to fall behind.
- User engagement. The more time you spend exploring on Google, the more their brand is burned into your subconscious.
- Later expansion of search monetisation. Adsense and Adwords get a better CTR than untargeted advertising because they adapt to the context of your search. If context can also be established visually there seems like strong potential for revenue.
Getting results
The quality of results for a project like this are always going to be variable as the compromises between precision, recall, performance, and cost are going to continue to be sketched out in crayon until more mature vocabularies and toolsets are available. That said, Google need to keep users impressed, and they’ve done pretty well.
A few good examples:
A few bad examples:
Under the hood
Once the “qtype=similar” parameter is set in the URL, the only parameter that affects the set of similar images is the “tbnid” which identifies the query image. The text query parameter does not seem to change the result set, only changing the accompanying UI. While this doesn’t allow us to draw any dramatic conclusions it would allow them to pre-compute the results for each image.
The first clear conclusion is metadata. Google have obviously been leveraging their formidable text index, and why not. The image similarity behaviour indicates that the textual metadata associated with images is being used to affect the results. One of the clearest indicators is that they’re capable of recognising the same individual’s face as long as that person’s name is mentioned. Unnamed models don’t benefit from the same functionality.
My second insight is that they’re almost certainly using a structural technique such as Wavelet Decomposition to detect shapes within images. The dead give-away here is that search results are strongly biased towards photographs taken from the same angle.
I suspect that they’re not yet using a visual fingerprinting technique (such as FAST) to recognise photographs of the same object. If they were doing this already I suspect that they’d have used this method to remove duplicate images. This may well come later.
Finally
All in all my impression is that they’ve implemented this stuff well, but that there’s a lot more yet to come. Namely:
- Handling of duplicates, i.e. separation between searching for the similar images and instances of the same image
- A revenue stream
4 Comments |
image search, search, technology | Tagged: CBIR, Empora, Google, image search, search |
Permalink
Posted by Richard Marr
Monday 12th January, 2009
I must have missed the launch of this feature, but Incogna’s most recent blog post talks about how they’ve implemented visual advertising. The results vary, but overall they’ve implemented it well.
I’ve written about Incogna’s image search before, but there’s more to add; when using this tool, as a user you have no visibility into the depth or type of data available to you. Nor does the app currently give control over movement, other than using text search and query images.
Establishing context (or, lost in the supermarket)
Any fans of Steve Krug’s usability classic will recognise the metaphor here. If you’re in an aisle in a supermarket you can see both the length of the aisle and the content of the shelves (at least the ones near you). You also know your rough position in the store, and can see signs and the contents of shelves.
Using that input data you can navigate (with a few hiccups) anywhere in the store.
Incogna’s app currently allows you to compare visually, and to search using text, but the depth and type of results remains hidden. As such there’s no real way to effectively navigate within the data set.
I should be clear at this point that this isn’t a criticism of Incogna’s app. This is not a problem with an easy or obvious solution. What I’m suggesting is that there’s still scope for some killer navigation features in this area.
Making money
The monetisation feature on Incogna appears only when their system thinks it can produce a good match between your search and the sponsored products. This is a wise move, since irrelevant ads would ruin the user experience.
It seems like the results use mainly visual comparison data, possibly with some categorisation thrown in. It worked brilliantly with pictures of trucks, but curiously while I was browsing Canon cameras it presented sponsored ads for televisions (both are rectangular I suppose).
Having fun
The main issue standing in the way of Incogna’s revenue stream is that their app is not yet fun to use. As mentioned above there’s no sense of position or direction. You can’t learn anything about the images you find without clicking through to the source site, and you can’t properly refine your search… you have to start again, which means that there’s no big advantage over Google, or any other text-based image search.
More another time.
2 Comments |
image search, search, technology | Tagged: CBIR, image search, Incogna, Pixsta, search, semantic search |
Permalink
Posted by Richard Marr
Sunday 11th January, 2009
Back in May, Alex Iskold over on ReadWriteWeb kicked off a discussion of how “semantic search” technologies are doing, and where they’re headed. I came across the article again recently and it prompted me to write this.
Semantic search has often been named as the successor to Google. This is a prediction which I think misses two key points.
You don’t have to be a semantic search company to do it
Extracting and presenting structured data from unstructured or partially structured sources is part of the top-down approach to the Semantic Web (aka. Web 3.0, apparently). The basic idea is that using language analysis, machine learning and databases of entities you can understand content, rather than just processing it statistically like 20th century search engines. This gives you the possibility of a richer and tighter search experience, e.g. an initial search for “bush” could then be easily narrowed to only include articles about the Australian bush rather than George W.
While semantically-driven faceted search is still the domain of Grapeshot, Clusty, etc. the underlying technologies are already in use by mainstream search engines. Even image search engines such as Pixsta use semantic technology to extract structured data from unstructured documents (in our case, the documents happen to be images).
Google will not be killed with minor features
When was the last time you had to click through to the second page of search results? In fact, when was the last time you had to scroll past the fold to the 10th result? Maybe some of you have recently, but I’d bet it doesn’t happen often.
What this says is that for the main search engine use case, text-driven statistical search is good enough. Without a killer feature for mainstream users semantic search engines will not be able to tempt them away from the very simple tool they’ve already learned how to use. I agree with Iskold’s point that these companies need to create a very good user interface… although I disagree that this will be enough to win search market share.
It’s not all doom and gloom though. Semantic technology is impressive. If you get a chance to try out a tool like Silobreaker you’ll find some very interesting user interface work and some impressive data analysis happening behind the scenes. In my opinion it’s niches like these (Silobreaker is a semantic tool for news search and political research) where users have enough motivation and specialisation to move away from the top 5 search results on Google/Yahoo/Live.
Leave a Comment » |
image search, opinion, search, technology | Tagged: Google, image search, Pixsta, search, semantic search, Grapeshot, Clusty, Silobreaker |
Permalink
Posted by Richard Marr
Thursday 27th November, 2008
We’re nearing the completion of the first Alpha of our new content-based image retrieval (CBIR) engine. It seems wrong to let it be born without a name, so I’ve settled provisionally on Dose which stands for Distributed Object Similarity Engine.
It’ll be a little while before we’ve got any user-facing products to show for all our hard work, but we’ve learnt a lot and should have something good at the end of it.
Leave a Comment » |
image search, search, technology | Tagged: CBIR, DOSE, image search, Pixsta, research, search |
Permalink
Posted by Richard Marr
Thursday 27th November, 2008
This morning I had another play with the Multicolr Search Lab from visual search outfit Idée Inc and decided to make some notes, which I’ve posted below.
Idée in context
Idée are one of the bigger players in the visual search space, although they currently occupy quite a different market to my new team Pixsta.
Idée’s biggest product so far is TinEye, which is used to find uses of a single specific source image across the Internet. The main commercial use appears to be detection of copyright infringment, a service they provide to photographers and copyright owners for a fee. To my (admitedly limited) knowledge they’re the only company offering this specific service.
MultiColr
The Multicolr Search Lab (MSL) is a proof-of-concept that demonstrates Idée’s ability to index image by colour. As an image-based ‘labs’ project, the UI naturally reflects a mixture and Google Labs functionality and Flickr’s Web 2.0 styling. Its clean and simple. I like it a lot.
Naturally as a ‘labs’ project it has no direct revenue stream, but it’s a nice demo and there may well be use for this type of technology in some areas: Interior design for example; add a simple hook-up to a printing service like Photobox and you’ve got a revenue stream for photographers and your own service.
As Multicolr is currently running at an adequate speed over a stated 10 million images I’d guess that the underlying technology is ready for at the least enterprise scale applications, if not internet scale.
MultiColr engine
From my previous play with Multicolr I has certain expectations as to how it worked internally.
In any search application you need precision, but you also need to be able to bring back close matches if no direct matches exist. Since MultiColr is using RGB hex values as query terms I’d suspected that they’d rounded those hex values to match their chosen quantisation, and were matching on those approximate colours.
This isn’t what they’re doing, as I discovered when I changed the colour specified in their nicely RESTful URL to a subtely different colour (one that should fall within the same quantisation bin). If they were using naive quantisation then the results would have remained the same. In fact they changed.
So although they may still be using some form of quantisation to avoid gaps in queries, it seems like they’re also allowing the raw colour value (either in RGB or a different colour space) into the index, and scoring images appropriately based on each query, i.e. the quantisations seen in the UI may well be totally arbitrary.
All in all, an good piece of work. Well done Idée.
Leave a Comment » |
image search, search, technology | Tagged: CBIR, Idée, image search, search |
Permalink
Posted by Richard Marr