Shopachu – Incogna’s new visual product browser

Tuesday 5th January, 2010

In the back half of last year visual search outfit Incogna released their visual shopping browser Shopachu. I’ve followed some of Incogna’s previous releases so I thought I’d share some thoughts on this one too.

What does it do?

This site has a very similar model to our own consumer-facing MAST app; Empora. It makes money by sending consumers to retailer sites, who for obvious reasons are willing to pay for suitable traffic. The main forces that influence the design of a site like this are retention, and the clickthrough and conversion rates of your traffic:

Retention – you need to impress people, then ideally remind them to come back to you

Clickthrough – you need to send a good proportion of visitors to retailers in order to make money

Conversion – if the visitors you send aren’t interested in buying the clicked product then the retailers won’t want to pay for that traffic on a per-click basis (although they might be interested in the CPA model, which doesn’t pay until someone buys)

First Impressions

People’s first impressions are usually determined by a combination of design and how well a site conforms to their expectations. I’ve probably got distorted expectations considering my experience working with this type of application, but in that respect I was pleasantly surprised; Shopachu has some good features and makes them known. In terms of design I was less impressed, the icons and gel effects don’t seem to fit and I think there are whitespace and emphasis issues (sorry guys, trying to be constructive).

Finding stuff

It’s fairly easy to find things on Shopachu. The filters are easy to use (although I could get the brand filter to work, could be a glitch). The navigation is pretty easy, although it doesn’t currently provide second generation retail search features like facet counts (i.e. showing the number of products in a category before you click on it).

The biggest interesting technological problem I’ve noticed with their navigation is the colour definitions. There’s a big difference between a colour being present in an image, and the eye interpreting that colour as being present in an image. I think there are some improvements to be made in the way colours are attributed to images (e.g. here I’ve applied a pink filter but am seeing products with no pink returned). Similarly there’ll be another marked improvement with better background removal (e.g. here I’d applied a light blue filter and am seeing products with blue backgrounds).

Similarity search

Shopachu’s similarity search is quite different to Empora’s.  They’ve chosen to opt for maximum simplicity in the interface rather than user control, resulting in a single set of similarity search results. In contrast, Empora allows users to determine whether they’re interested in colour similarity, or shape similarity, or both. Simplicity often wins over functionality (iPod example #yawn) so it’ll be interesting to see how they do.

Another issue is the quality of the input data. This challenge is the same for Empora, or anyone else aggregating data from third parties, in that category information is inconsistent. One effect of this is that when looking at the similarity results for an often poorly-classified item like a belt you may also see jewellery or other items that have been classified as “accessories” or “miscellaneous” in the retailer’s data, another effect is that you often see duplicate items.

Keeping the traffic quality high

An interesting design decision for me is that the default image action on Shopachu is a similarity search, i.e when you click on the image it takes you to an internal page featuring more information and similar products. This is in contrast to the default action on Empora or Like.com, which is to send the visitor to the retailer’s product page.

The design trade-off here is between clickthrough and conversion rates. If you make it easy to get to the retailer your clickthrough rate goes up, but run the risk of a smaller proportion converting from a visit to a purchase. Here Shopachu are reducing that risk (and also the potential up-side) by keeping visitors on their site until they explicitly signal the intent to buy (the user has to click “buy” before they’re allowed through to the retailer).

Getting people hooked

There are a few features on Shopachu aimed at retention, namely Price Alerts and the ability to save outfits (Polyvore style). These features seem pretty usable, although I think they’re still lacking that level of polish that inspires passionate users. I’d be interested to know what the uptake statistics look like.

In summary

I think this implementation shows that Incogna have thought about all the right problems, and I think have clearly got the capability to solve the technological issues. On the down-side; cleaning up retailer’s data is a tough business which will be time-consuming, and I think they need to find a little inspiration on the visual design side.


Guest post – Similarity search: The Two Shoe Problem

Thursday 30th July, 2009

Today I’m introducing my first ever guest post, written by Pixsta‘s own Rohit Patange about some great work he’s been doing with the guidance of Tuncer Aysal. You’ll be able to see the results of their work shortly on our consumer-facing site Empora. – RM

We at Pixsta are interested in understanding what is in an image (recognise and extract) and do so in an automated way that involves a minimum amount of human input.

Our raw data (images and associated textual information) come from a variety of retailers with considerable variation in terms of data formats and quality. Some retailer images are squeaky clean with white backgrounds and a clear product depiction while others have multiple views of the product, very noisy backgrounds, models, mannequins and other such distracting objects. Since we only care about the product, an essential processing step involves identification of all image parts and the isolation of individual products, if several are present in the retailer image.

The n-shoe case:

Let’s take the case of retailer images with multiple product views. This is most commonly encountered in shoe images.  Let us call each of the product views a ‘sub-image’.

When we talk about similar shoes we talk about a shoe being similar to the other (note the singular). We have to disregard how the shoe is presented in the image, the position of the sub-images, the orientation and other noise. If we do not do so, image matching technology tends to pick out images with similar presentation rather than similar shoes. Typically a retailer image (a shoe they are trying to sell) will have a pair of sub-images of shoes in different viewing angles. Pictorially with standard image matching we get the following results for a query image on the left:

Visual similarity query showing product presentation affecting results

Even though the image database contains images like:

Two shoes pointing to the right

These are not in the result set despite them being much closer matches, because of the presentation and varying number of sub-images.  To overcome this drawback, we have to extract the sub-image which best represents the product for each of the images and then compare these sub-images. For the sub-image to be extracted, the image will need to go through the following processing steps:

  • Determine which of the sub-images is the best represents the shoe.
  • Extract that sub-image.
  • Determine the shoe orientation in that sub-image.
  • Standardise the image by rotation, flipping and scaling.

All the product images (shoes in this case) go through this process of standardisation, resulting in a uniform set of images. Pictorially the input and the output image of the standardisation process are:

Shoes segmented and standardised to point right

Let’s look at the procedure in more detail assuming that the image has been segmented into background and foreground.

  • The first step is to identify all the sub-images on the foreground. The foreground pixels of the images are labelled in such a way that different sub-images have different label to mark them as distinct.
  • After the first iteration of labelling there is a high possibility that a sub-image is marked with 2 or more labels. Therefore all connected labels have to be merged.
    Segmented shoe images
  • The third step is to determine which of the sub-images is of interest; that is picking the right label.
    Choosing an image segment
  • Once the right sub-image is extracted the orientation of this sub-image is corrected to match a predefined standard to remove the differences in the terms of size of the product image, orientation (the direction the shoe is pointing towards) and the position of the shoe (sub-image) within the image.
    Single shoe pointing to the right

All product images (shoes in this case) go through this process before the representative information from the image is extracted for comparison. Now the results for the query image will look like:

Resulting query showing standardised similarity

Generally there are two shoes in an image. But the method can be extended to ‘n’ shoes.


Google Image Similarity first impressions

Tuesday 21st April, 2009

Right in line with my too-obvious-to-be-worth-anything prediction, Google have just released a Labs image similarity feature for Google Images. Others have commented on this already, but obviously this is hugely interesting for me because of my currently work on Empora‘s exploratory visual search so I’m going to throw my tuppence into the ring aswell.

Below are my first impressions.

Product impact

Google Similar Images (GSI) offers just one piece of functionality, the ability to find images that are similar to your selected image. You may only select images from their chosen set, there’s no dynamic image search capacity yet. Similar images are displayed either as a conventional result set when you click on “similar images”, or as a list of thumbnails in the header when you click through to see the original source.

The aims of this work will be (broadly):

  1. Keeping up with the Joneses. The other major search engines are working on similar functionality and Google can’t be seen to fall behind.
  2. User engagement. The more time you spend exploring on Google, the more their brand is burned into your subconscious.
  3. Later expansion of search monetisation. Adsense and Adwords get a better CTR than untargeted advertising because they adapt to the context of your search. If context can also be established visually there seems like strong potential for revenue.

Getting results

The quality of results for a project like this are always going to be variable as the compromises between precision, recall, performance, and cost are going to continue to be sketched out in crayon until more mature vocabularies and toolsets are available. That said, Google need to keep users impressed, and they’ve done pretty well.

A few good examples:

A few bad examples:

Under the hood

Once the “qtype=similar” parameter is set in the URL, the only parameter that affects the set of similar images is the “tbnid” which identifies the query image. The text query parameter does not seem to change the result set, only changing the accompanying UI. While this doesn’t allow us to draw any dramatic conclusions it would allow them to pre-compute the results for each image.

The first clear conclusion is metadata. Google have obviously been leveraging their formidable text index, and why not. The image similarity behaviour indicates that the textual metadata associated with images is being used to affect the results.  One of the clearest indicators is that they’re capable of recognising the same individual’s face as long as that person’s name is mentioned. Unnamed models don’t benefit from the same functionality.

My second insight is that they’re almost certainly using a structural technique such as Wavelet Decomposition to detect shapes within images. The dead give-away here is that search results are strongly biased towards photographs taken from the same angle.

I suspect that they’re not yet using a visual fingerprinting technique (such as FAST) to recognise photographs of the same object. If they were doing this already I suspect that they’d have used this method to remove duplicate images. This may well come later.

Finally

All in all my impression is that they’ve implemented this stuff well, but that there’s a lot more yet to come. Namely:

  • Handling of duplicates, i.e. separation between searching for the similar images and instances of the same image
  • A revenue stream

Empora walk-through

Wednesday 8th April, 2009

The first flight is always a little wobbly, and true to form there was a slight hiccup for Empora over the weekend. Still, it’s been live for a week now and is holding up well. Considering how

So now all the excitement of  the launch has settled down and we’re back into routine I think it’s time for a quick walk through the functionality (which won’t take that long since we haven’t put that much live yet; there’s a lot of interesting functionality left to come).

Hunting vs. gathering

Plenty of people go into a shop armed with a plan. They know what they want, or at least what specific need they need to fill. Others like to browse, look at what there is, what other people are doing, and generally wait for inspiration or recommendation. We’ve tried to fulfil both of those patterns using the both standard “search vs. browse” split, but have tried to improve both.

Browse

When you view an item, for example this orange Ghibli bag, we obviously show a picture, description, etc. and link to the retailer. All standard stuff for a shopping aggregator. What we’ve added is that we also show the most visually similar items in our collection, according to three different sets of criteria:

  1. We show the most similar bags by shape, so that anyone who’s interested in a particular style or type of bag can see them straight away.
  2. We show bags in the most similar colours, so anyone who was drawn to that bag because of its colour can see lots of other bags that they may also be interested in.
  3. We show products from other categories in the same colour, in case users want to colour-coordinate.

Search

In addition to the regular search options you’d expect (category, keywords, etc.) we also allow people to search by the overall colour of the item (from the top right corner of any page). Now in terms of technology I’m not particularly happy with this functionality yet, but I’m a perfectionist. It already performs a lot better visually than the Amazon equivalent*, and I know that we’ve got big improvements in the pipeline.

* To be fair to Amazon their results are better than they look. The products they show are available in the query colour, they just choose to show only the first image, so their results look broken by visual inspection.

Back to the physical shop metaphor

What we’re trying to do is help the searchers search by enabling them to search using visual data, effectively the equivalent to training all the staff in a shop to be able to answer questions like “have you got anything that goes with these shoes?”.

At the same time we’re trying to help the browsers by sorting each department by type and colour, so they always know where they’re going.

Obviously this is fairly fresh territory so there’ll always be wrinkles that need ironing out, but on the whole I think the trend towards smarter indexing is inevitable, and the indexing of visual information is part of that (that’s a whole other post).


Pixsta team launches Empora fashion site

Friday 3rd April, 2009

Yesterday night we finally broke a bottle of champagne against the side of the good ship Empora and watched her slide out of the dock. We’ve been working on the project for the past couple of months, so it’s a pleasure to see it go live.

As well as the usual search functionality you’d expect on a retail site, Empora enables searching and browsing using the content of product images (currently either women’s clothes or men’s clothes). When you view a product you’re also shown items that may relate to it visually, either in terms of shape or colour.

As with any project there are always things I’d change, and things that aren’t done yet, but overall I’m pretty chuffed with what our team has accomplished so far. We’re by no means finished though. Expect big things in the near future.


Twitter search parallels other vertical search domains

Sunday 1st March, 2009

In case you haven’t tried it already, Twitter’s search tool is very well implemented. It’s effective, slick, and very fast.

Being able to quickly and efficiently search through the life streams and conversations of a good proportion of the thought leaders and early adopters in the UK and US seems to me like something with a bit of potential… a stream that’s ripe for news and knowledge management apps like Techmeme, Silobreaker, and Google News. It’s a fair bet that conversation and life-streaming will be a valuable search domain just like user-uploaded video (apparently Youtube searches outnumber Yahoo’s).

Conventional (i.e. text and metadata-driven) image search is another search domain in which the big search companies seem willing to absorb losses. As I (and many others) have mentioned before, their willingness to do this stems from their desire to occupy user mindshare for the entire search concept, rather than piecemeal domains or verticals. As we can see from attempts by Google and Microsoft to include content-based image retrieval (CBIR) functionality that eagerness is not likely to be restricted to textual image search.

While my opinion may obviously be biased, I wouldn’t be that surprised to see “conversation” (Twitter, Friendfeed and life-streaming) and “product” (including price and visual similarity features)  tabs integrated into the search boxes of the big three in the relatively near future.


Recognising specific products in images

Wednesday 21st January, 2009

Yesterday Andrew Stromberg pointed me to the excellent IPhone app by image-matching outfit Snaptell.

Snaptell’s application takes an input image (of an album, DVD, or book) supplied by the user and identifies that product, linking to 3rd party services. This is equivalent to the impressive TinEye Music but with a broader scope. As Andrew points out, the app performs very well at recognising these products.

Algorithmically the main problems faced by someone designing a system to do this are occlusions (e.g. someone covering a DVD cover with their thumb as they hold it) and transformations (e.g. skewed camera angle, or a product that’s rotated in the frame)

There are a number of techniques to solve these problems, (e.g. the SIFT and SURF algorithms) most of which involve using repeatable methods to find key points or patterns within images, and then encoding those features in such a way that is invariant to rotation (i.e. will still match when upside-down) and an acceptable level of distortion. At query-time the search algorithm can then find the images with the most relevant clusters of matching keypoints.

It seems like Snaptell have mastered a version of these techniques. When I tested the app’s behaviour (using my copy of Lucene in Action) I chose an awkward camera angle and obscured around a third of the cover with my hand and it still worked perfectly. Well done Snaptell.


Welcome to the image link revolution

Monday 19th January, 2009

The hyperlink revolution allowed text documents to be joined together. This created usable relationships between data that have enabled one of the biggest technological shifts of the recent age… large scale adoption of the internet. Try to imagine Wikipedia or Google without hyperlinks and you’ll see how critical this technique is to the web.

We’re on the verge of another revolution, this time in computer vision.

Imagine a world were the phone in your pocket could be used to find or create links in the physical world. You could get reviews for a restaurant you were standing outside without even knowing its name, or where you were. You could listen to snippets of an album before you bought it, or find out where nearby has the same item for less. You could read about the history of an otherwise unmarked and anonymous building, get visual directions, or use your camera phone as a window into a virtual game in the real world.

A team at the university of Ljubljana (the J is pronounced like a Y for anyone unfamiliar) have released a compelling video demonstrating their implementation of visual linking. They use techniques that I assume are derived from SIFT to match known buildings in an unconstrained walk through a neighbourhood. These image segments are then converted into links to enable contextually relevant information.

When you combine this with other other techniques, such as the contour-based work being done by Jamie Shotton of MSR and you start to see how that future will appear. Bring in the mass adoption of GPS handsets driven by the Iphone amongst others and it’s pretty clear there’s going to be a change in the way people create and access information.

The only questions are who, and when.


Incogna monetise pure image search

Monday 12th January, 2009

I must have missed the launch of this feature, but Incogna’s most recent blog post talks about how they’ve implemented visual advertising. The results vary, but overall they’ve implemented it well.

I’ve written about Incogna’s image search before, but there’s more to add; when using this tool, as a user you have no visibility into the depth or type of data available to you. Nor does the app currently give control over movement, other than using text search and query images.

Establishing context (or, lost in the supermarket)

Any fans of Steve Krug’s usability classic will recognise the metaphor here. If you’re in an aisle in a supermarket you can see both the length of the aisle and the content of the shelves (at least the ones near you). You also know your rough position in the store, and can see signs and the contents of shelves.

Using that input data you can navigate (with a few hiccups) anywhere in the store.

Incogna’s app currently allows you to compare visually, and to search using text, but the depth and type of results remains hidden. As such there’s no real way to effectively navigate within the data set.

I should be clear at this point that this isn’t a criticism of Incogna’s app. This is not a problem with an easy or obvious solution. What I’m suggesting is that there’s still scope for some killer navigation features in this area.

Making money

The monetisation feature on Incogna appears only when their system thinks it can produce a good match between your search and the sponsored products. This is a wise move, since irrelevant ads would ruin the user experience.

It seems like the results use mainly visual comparison data, possibly with some categorisation thrown in. It worked brilliantly with pictures of trucks, but curiously while I was browsing Canon cameras it presented sponsored ads for televisions (both are rectangular I suppose).

Having fun

The main issue standing in the way of Incogna’s revenue stream is that their app is not yet fun to use. As mentioned above there’s no sense of position or direction. You can’t learn anything about the images you find without clicking through to the source site, and you can’t properly refine your search…  you have to start again, which means that there’s no big advantage over Google, or any other text-based image search.

More another time.


Semantic search is not a “Google killer”

Sunday 11th January, 2009

Back in May, Alex Iskold over on ReadWriteWeb kicked off a discussion of how “semantic search” technologies are doing, and where they’re headed. I came across the article again recently and it prompted me to write this.

Semantic search has often been named as the successor to Google. This is a prediction which I think misses two key points.

You don’t have to be a semantic search company to do it

Extracting and presenting structured data from unstructured or partially structured sources is part of the top-down approach to the Semantic Web (aka. Web 3.0, apparently). The basic idea is that using language analysis, machine learning and databases of entities you can understand content, rather than just processing it statistically like 20th century search engines. This gives you the possibility of a richer and tighter search experience, e.g. an initial search for “bush” could then be easily narrowed to only include articles about the Australian bush rather than George W.

While semantically-driven faceted search is still the domain of Grapeshot, Clusty, etc. the underlying technologies are already in use by mainstream search engines. Even image search engines such as Pixsta use semantic technology to extract structured data from unstructured documents (in our case, the documents happen to be images).

Google will not be killed with minor features

When was the last time you had to click through to the second page of search results? In fact, when was the last time you had to scroll past the fold to the 10th result? Maybe some of you have recently, but I’d bet it doesn’t happen often.

What this says is that for the main search engine use case, text-driven statistical search is good enough. Without a killer feature for mainstream users semantic search engines will not be able to tempt them away from the very simple tool they’ve already learned how to use. I agree with Iskold’s point that these companies need to create a very good user interface… although I disagree that this will be enough to win search market share.

It’s not all doom and gloom though. Semantic technology is impressive. If you get a chance to try out a tool like Silobreaker you’ll find some very interesting user interface work and some impressive data analysis happening behind the scenes. In my opinion it’s niches like these (Silobreaker is a semantic tool for news search and political research) where users have enough motivation and specialisation to move away from the top 5 search results on Google/Yahoo/Live.


Follow

Get every new post delivered to your Inbox.