As the browser war hots up, Google has Bing in its sights

Monday 11th January, 2010

Google Chrome advertising (via flickr/iainpurdie)

As any self-respecting nerd will have noticed, and others have already noted, Google recently started advertising its Chrome web browser on billboards and in newspapers around the UK. This represents an escalation of the second phase of the browser wars, and one of the few occasions Google has resorted to billboards to advertise a product.

Why bother advertising a free product?

The answer to why Google are advertising Chrome (which is a free download) is unsurprisingly similar to the answer to the bigger question; why bother building and supporting a free product?

Google make money by monetising user’s searches. People are great at optimising finding and using short-cuts, and modern browsers have built-in search bars. In short, more people using your search bar means more money, and Chrome (like Firefox) defaults to searching on Google.

Billboards – dated but still relevant

Let’s face it, it’s not Google’s style to put up great big billboards. It’s not smart, it’s not targeted, it’s not high-tech. However, ironically those attributes are exactly why they work in this situation.

Google’s main competitor in the search space is Microsoft (who have incidentally been advertising their search engine Bing heavily) and Microsoft’s largest user-base is the slow-moving majority who get Internet Explorer bundled with their PC. Via its default status in Internet Explorer Bing is by default used by that same slow-moving majority.

Since the majority is too big to be worth the extra cost of targeting; the common or garden billboard is a suitable way to get through to them (at the same time as reinforcing the brand with nerds who already know about it).

Advertisements

Google Image Similarity first impressions

Tuesday 21st April, 2009

Right in line with my too-obvious-to-be-worth-anything prediction, Google have just released a Labs image similarity feature for Google Images. Others have commented on this already, but obviously this is hugely interesting for me because of my currently work on Empora‘s exploratory visual search so I’m going to throw my tuppence into the ring aswell.

Below are my first impressions.

Product impact

Google Similar Images (GSI) offers just one piece of functionality, the ability to find images that are similar to your selected image. You may only select images from their chosen set, there’s no dynamic image search capacity yet. Similar images are displayed either as a conventional result set when you click on “similar images”, or as a list of thumbnails in the header when you click through to see the original source.

The aims of this work will be (broadly):

  1. Keeping up with the Joneses. The other major search engines are working on similar functionality and Google can’t be seen to fall behind.
  2. User engagement. The more time you spend exploring on Google, the more their brand is burned into your subconscious.
  3. Later expansion of search monetisation. Adsense and Adwords get a better CTR than untargeted advertising because they adapt to the context of your search. If context can also be established visually there seems like strong potential for revenue.

Getting results

The quality of results for a project like this are always going to be variable as the compromises between precision, recall, performance, and cost are going to continue to be sketched out in crayon until more mature vocabularies and toolsets are available. That said, Google need to keep users impressed, and they’ve done pretty well.

A few good examples:

A few bad examples:

Under the hood

Once the “qtype=similar” parameter is set in the URL, the only parameter that affects the set of similar images is the “tbnid” which identifies the query image. The text query parameter does not seem to change the result set, only changing the accompanying UI. While this doesn’t allow us to draw any dramatic conclusions it would allow them to pre-compute the results for each image.

The first clear conclusion is metadata. Google have obviously been leveraging their formidable text index, and why not. The image similarity behaviour indicates that the textual metadata associated with images is being used to affect the results.  One of the clearest indicators is that they’re capable of recognising the same individual’s face as long as that person’s name is mentioned. Unnamed models don’t benefit from the same functionality.

My second insight is that they’re almost certainly using a structural technique such as Wavelet Decomposition to detect shapes within images. The dead give-away here is that search results are strongly biased towards photographs taken from the same angle.

I suspect that they’re not yet using a visual fingerprinting technique (such as FAST) to recognise photographs of the same object. If they were doing this already I suspect that they’d have used this method to remove duplicate images. This may well come later.

Finally

All in all my impression is that they’ve implemented this stuff well, but that there’s a lot more yet to come. Namely:

  • Handling of duplicates, i.e. separation between searching for the similar images and instances of the same image
  • A revenue stream

Twitter search parallels other vertical search domains

Sunday 1st March, 2009

In case you haven’t tried it already, Twitter’s search tool is very well implemented. It’s effective, slick, and very fast.

Being able to quickly and efficiently search through the life streams and conversations of a good proportion of the thought leaders and early adopters in the UK and US seems to me like something with a bit of potential… a stream that’s ripe for news and knowledge management apps like Techmeme, Silobreaker, and Google News. It’s a fair bet that conversation and life-streaming will be a valuable search domain just like user-uploaded video (apparently Youtube searches outnumber Yahoo’s).

Conventional (i.e. text and metadata-driven) image search is another search domain in which the big search companies seem willing to absorb losses. As I (and many others) have mentioned before, their willingness to do this stems from their desire to occupy user mindshare for the entire search concept, rather than piecemeal domains or verticals. As we can see from attempts by Google and Microsoft to include content-based image retrieval (CBIR) functionality that eagerness is not likely to be restricted to textual image search.

While my opinion may obviously be biased, I wouldn’t be that surprised to see “conversation” (Twitter, Friendfeed and life-streaming) and “product” (including price and visual similarity features)  tabs integrated into the search boxes of the big three in the relatively near future.


Semantic search is not a “Google killer”

Sunday 11th January, 2009

Back in May, Alex Iskold over on ReadWriteWeb kicked off a discussion of how “semantic search” technologies are doing, and where they’re headed. I came across the article again recently and it prompted me to write this.

Semantic search has often been named as the successor to Google. This is a prediction which I think misses two key points.

You don’t have to be a semantic search company to do it

Extracting and presenting structured data from unstructured or partially structured sources is part of the top-down approach to the Semantic Web (aka. Web 3.0, apparently). The basic idea is that using language analysis, machine learning and databases of entities you can understand content, rather than just processing it statistically like 20th century search engines. This gives you the possibility of a richer and tighter search experience, e.g. an initial search for “bush” could then be easily narrowed to only include articles about the Australian bush rather than George W.

While semantically-driven faceted search is still the domain of Grapeshot, Clusty, etc. the underlying technologies are already in use by mainstream search engines. Even image search engines such as Pixsta use semantic technology to extract structured data from unstructured documents (in our case, the documents happen to be images).

Google will not be killed with minor features

When was the last time you had to click through to the second page of search results? In fact, when was the last time you had to scroll past the fold to the 10th result? Maybe some of you have recently, but I’d bet it doesn’t happen often.

What this says is that for the main search engine use case, text-driven statistical search is good enough. Without a killer feature for mainstream users semantic search engines will not be able to tempt them away from the very simple tool they’ve already learned how to use. I agree with Iskold’s point that these companies need to create a very good user interface… although I disagree that this will be enough to win search market share.

It’s not all doom and gloom though. Semantic technology is impressive. If you get a chance to try out a tool like Silobreaker you’ll find some very interesting user interface work and some impressive data analysis happening behind the scenes. In my opinion it’s niches like these (Silobreaker is a semantic tool for news search and political research) where users have enough motivation and specialisation to move away from the top 5 search results on Google/Yahoo/Live.


The State of Image Search

Wednesday 6th August, 2008

There’s currently a lack of direction in the image search products offered by the leaders in the field. Each offering is quite different, and none have fully realised revenue streams. This is a quick summary of the current state of play.

Text search by any other name

Some image search engines learn about images solely by leveraging image meta-data and nearby text in parent documents. It’s a little like identifying a photograph by the name on the album cover and the writing on the back of the photo. This was an ideal solution for text search engines like Google and Yahoo, who could leverage their existing data and infrastructure.

Getting smarter

Microsoft’s Live Search have recently started broadening the mainstream by adding the capability to analyse the images themselves. For example, the Live Search team have added the ability for their system to recognise faces.

Playing the name game

The big players in search get revenue from serving up relevant advertising, but so far none of them have successfully monetised image search. Currently image search serves as a loss-leader that exists to support their search brands, a visible sign that they’ve still got chips in the big game.

That doesn’t mean they’re sitting on their hands. Both Microsoft and Google employ researchers in the area of image comparison and classification so expect big developments from them in 2009.

Pure image search start-ups

There are a few start-ups with an eye on the prize of being the first to monetise image search. Being smaller and more maneuverable than the big players they’ve got off the ground faster, but have yet to build up significant numbers. Start-ups to keep an eye on include Picitup (find similar images, celebrity face comparison), Riya/Like (text-driven image search and product search), and the Toronto-based Idée Inc (copyright monitoring, colour-based search).

These guys are hungry for revenue, so I expect to have fresh news in Q4 this year.

The home team

I work for Pixsta, another image search start-up. We’ve pulled together the basis of a decent team, and should start taking over the world shortly. As for what we’re working on, I’ll write more when I know what’s safe to write about outside NDA :o)


Google Reader as a Social Tool

Tuesday 13th May, 2008

Google Reader has recently added the option to comment on your shared articles. I like Google Reader, and I like the option to share articles with my contacts. I actually found myself wishing for this feature a while back.

Having thought about it however, I have two problems with the concept of sharing articles via Reader, one is a problem of functionality and the other a problem of scope.

Functionality

Google Reader is not currently a conversational medium, yet sharing articles is a likely starting point for a conversation. Without the ability to converse users have to shift to another product in order to continue the discussion that was started by the sharing of an article. If that context shift is seen as being more hassle than the comment is worth then the user making it probably wont bother, and you’ve lost that social interaction and hence possible value.

Scope

When I share an article via Google Reader, only my contacts can see it. That could be a sensible design choice (after all some people aren’t interested in broadcasting themselves to the world) if all of my contacts used Google Reader, but they don’t. When I share an article, only the 3 friends that use the product can see it. Nor is it really that appealing to ask people to subscribe to my Googler Reader Shared Items Feed (in addition to my blog, photostream, etc.) since I already have enough feeds and in any case I want to be able to comment and discuss the articles I find.

Suggestion

So what’s needed here is really a medium that can simply share articles, and at the same time allows commentary and conversation. A blog is a much more suitable medium for that conversation, but I already have a blog, I don’t want two. It’d be nice if there was some convergence here without having to use an aggregator like Friendfeed (yet another login).

What I think Google should be working towards in this area is something that allows the ease of sharing an article with the flexibility of blogging. I as a Google user should be able to aggregate all of my content at username.google.com, and allow friends to subscribe to the whole thing or to subcategories. Whats more, I should be able to use open standards to pull data from other services into my aggregated feed, so if I like to use Delicious for bookmarking I could configure that as username.google.com/bookmarks.


Adding Events to Users Calendars – Part 2 – Web Calendars

Monday 7th January, 2008

Part 1 of this project was to get downloads in the iCal format working so that people with desktop calendars can add events to them. Next up we want to be able to import into web calendars.

Since we’re talking about web calendars the sequence of events is a little different to with desktop calendars. Rather than having a link return a server-generated file that’s picked up by the desktop calendar, we simply point a link directly at the web calendar in question, and when the user clicks on it the data held in querystring parameters on that link is transmitted to the calendar. Sadly different calendars implement this differently, so you need to write custom formatting code for each calendar that you want to support. For that reason I’m going to stick to the biggest three, Yahoo, MSN, and Google (a possible motivation for big fish not to implement a standard URL format).

Please note that some of the following information is the product of my tinkering and may become unreliable as the companies that run these calendars make changes (if it isn’t unreliable already).

Google Calendar

Google’s instructions for publishing events are already about as simple as they can get, but I’ll summarise the basics here for the sake of completeness.

The components of my URL are:

http://www.google.com/calendar/event? – The location of this API
action=TEMPLATE – Don’t edit this unless you know more than me (not hard). I haven’t found any other interesting values.
text=My Event Put the title of your event in here
dates=20080101/20080101 The start and end dates for your event (YYYYMMDD). If you want to specify an exact time instead of just the date then use YYYYMMDDTHHmmSS, e.g. 20080101T143000 would be half past two on the 1st Jan 2008. If you specify times don’t forget to convert them to PST (GMT -8 hours).
sprop=website:www.justgiving.com &sprop=name:Justgiving The documentation says this is to identify the website or event source. It says it’s required, but excluding it seems to make no difference.
details=Don’t forget to donate at http://www.justgiving.com/richtest001 This is the description of your event. Don’t forget to URL Encode. Using Carriage Return and Line Feed characters seems to work fine. These encode to %0D and %0A respectively so %0D%0A will create a new line when decoded at Google’s end.

Insert test event into your Google calendar.

Google Calendar doesn’t return you to the original site when it finishes adding the event, so you may want to think about opening this link in a new window, either with window.open(), or preferably an HTML link using the target attribute.

There doesn’t appear to be any provision for adding reminders or recurring events using this URL format.

Yahoo Calendar

While looking for reference material for Yahoo’s calendar I came across a similar article to my own that covers the same ground. Is it a coincidence that the only similar article I’ve found is from someone else in the UK?

Anyway, the components of the URL for inserting an event into a user’s Yahoo calendar are these:

http://calendar.yahoo.com/? – The location of this API
v=60 Seems to be required. Other values I’ve tried just failed rather than doing anything interesting
DUR=0100 This is the duration of the event, in HHmm format. If you’re adding an all-day event then don’t bother with this.
TITLE=My Event Put the title of your event in here
ST=20070201 Start date/time, use either YYYYMMDD or YYYYMMDDTHHmmSS depending on whether it’s an all-day event
in_loc=My House Location of the event
DESC Description of the event
URL Page to link back to from the calendar

Insert test event into your Yahoo calendar.

UPDATE: Ryan McNallie has more information on his blog.

MSN Calendar

This was the most baffling API so far due to a complete absence of documentation, and one that’s not really to my taste since the software choice (ISAPI.dll) is visible in the URL. Not very elegant. On the other hand MSN calendar is the only one I’ve found that implements ID-based de-duplication. Anyway, here’s a working URL structure.

http://calendar.msn.com/calendar/isapi.dll? – The location of this API
pid=5020 – This value seems to be required. Changing it caused an error for me.
pn=Site Name – This is the name of your site.
id=Your Event ID – Required. This is a unique ID to avoid duplication of the same event in the calendar.
n=0 – This doesn’t seem to be required and I haven’t figured out what it is yet. Remove it at your own risk.
rurl=[encoded URL] – Required. This is the URL to return the user to once you’ve finished adding this event to their calendar. Frustratingly it’s required, so you’ll need to handle MSN users differently to everyone else.
s=Event Subject – The name of your event
d=20080113T040000Z/PT2H00M – The date and time details of your event, in the format [start date]T[start time]/[duration]. I’d guess the timezone would be controlled by the Z or PT elements of the format but it seemed to ignore any changes I made so I’m going to leave that for now.
l=[encoded URL] – The URL of the event on your website.
c=0 – No idea. Not required
r=E15 – Not sure. Not required
m=[Event Description] – This is the URL Encoded body text for your event

Add test event to MSN calendar

[UPDATE – 5th January 2010 – Windows Live Calendar]

MSN Calendar is long dead but redirects to Windows Live Calendar, so I thought I’d update this post with information on that.

You can find more thorough documentation on this Scribd document by Siva Vasanth, but the URL you want to construct is detailed below.

http://calendar.live.com/calendar/calendar.aspx? – The location of this API call
rru=addevent – Don’t edit this unless you know more than me (not hard).
summary=My Event Put the title of your event in here. Don’t forget to URL Encode.
location=London Location description. Don’t forget to URL Encode.
dtstart The start and end dates for your event YYYYMMDDTHHmmSS, e.g. 20080101T143000 would be half past two on the 1st Jan 2008. You may need to convert them to the right time zone for your calendar, I haven’t tested this. You may also be able to specify all-day events using the YYYYMMDD format.
dtend Specify the end time of the event, format as above.
description=Don’t forget to buy a card This is the description of your event. Don’t forget to URL Encode.

I haven’t found any mention of specifying reminders, repetition, or anything other than basic event details.

Scrybe

Scrybe is still in closed beta, a year after launch. I can’t really justify spending time on it right now. If I can find the time once they’re live I might update this article.