Twitter search parallels other vertical search domains

Sunday 1st March, 2009

In case you haven’t tried it already, Twitter’s search tool is very well implemented. It’s effective, slick, and very fast.

Being able to quickly and efficiently search through the life streams and conversations of a good proportion of the thought leaders and early adopters in the UK and US seems to me like something with a bit of potential… a stream that’s ripe for news and knowledge management apps like Techmeme, Silobreaker, and Google News. It’s a fair bet that conversation and life-streaming will be a valuable search domain just like user-uploaded video (apparently Youtube searches outnumber Yahoo’s).

Conventional (i.e. text and metadata-driven) image search is another search domain in which the big search companies seem willing to absorb losses. As I (and many others) have mentioned before, their willingness to do this stems from their desire to occupy user mindshare for the entire search concept, rather than piecemeal domains or verticals. As we can see from attempts by Google and Microsoft to include content-based image retrieval (CBIR) functionality that eagerness is not likely to be restricted to textual image search.

While my opinion may obviously be biased, I wouldn’t be that surprised to see “conversation” (Twitter, Friendfeed and life-streaming) and “product” (including price and visual similarity features)  tabs integrated into the search boxes of the big three in the relatively near future.

Semantic search is not a “Google killer”

Sunday 11th January, 2009

Back in May, Alex Iskold over on ReadWriteWeb kicked off a discussion of how “semantic search” technologies are doing, and where they’re headed. I came across the article again recently and it prompted me to write this.

Semantic search has often been named as the successor to Google. This is a prediction which I think misses two key points.

You don’t have to be a semantic search company to do it

Extracting and presenting structured data from unstructured or partially structured sources is part of the top-down approach to the Semantic Web (aka. Web 3.0, apparently). The basic idea is that using language analysis, machine learning and databases of entities you can understand content, rather than just processing it statistically like 20th century search engines. This gives you the possibility of a richer and tighter search experience, e.g. an initial search for “bush” could then be easily narrowed to only include articles about the Australian bush rather than George W.

While semantically-driven faceted search is still the domain of Grapeshot, Clusty, etc. the underlying technologies are already in use by mainstream search engines. Even image search engines such as Pixsta use semantic technology to extract structured data from unstructured documents (in our case, the documents happen to be images).

Google will not be killed with minor features

When was the last time you had to click through to the second page of search results? In fact, when was the last time you had to scroll past the fold to the 10th result? Maybe some of you have recently, but I’d bet it doesn’t happen often.

What this says is that for the main search engine use case, text-driven statistical search is good enough. Without a killer feature for mainstream users semantic search engines will not be able to tempt them away from the very simple tool they’ve already learned how to use. I agree with Iskold’s point that these companies need to create a very good user interface… although I disagree that this will be enough to win search market share.

It’s not all doom and gloom though. Semantic technology is impressive. If you get a chance to try out a tool like Silobreaker you’ll find some very interesting user interface work and some impressive data analysis happening behind the scenes. In my opinion it’s niches like these (Silobreaker is a semantic tool for news search and political research) where users have enough motivation and specialisation to move away from the top 5 search results on Google/Yahoo/Live.