Visualising activity on a project

Friday 5th March, 2010

We’ve got a few new faces here and are saying goodbye to some old ones, so now seems like a good a time as any to look back at what we’ve been doing. Below is a clipe from YouTube showing activity in our code repository since we started a fresh one in late 2008.

If you’re interested it was created using Code Swarm and MEncoder.


Automated tests for GSP views in Grails

Friday 26th February, 2010

Test-driven development (TDD) is handy if used sensibly, and we’re feeling the need to make our automated tests a little broader. The Grails site has great documentation on setting up tests for Controllers and Services, but I couldn’t find a decent explanation of how to set up tidy automated tests for GSPs… so without further ado this is what I did.

I want tests to be:

  • Easy to write (and hence read)
  • Low maintenance, i.e. I don’t want to have to update tests whenever I make changes that aren’t important to the test
  • Good at picking up unexpected behaviour rather than changes to HTML structure

Running your GSP from TestApp

Grails ships with a handy class called GroovyPagesTestCase which allows your test to get the output of a given GSP file given a defined input, like this:

def file = new File("grails-app/views/myview.gsp")
def model = [someVariable:12345]
def htmlString = applyTemplate( file.text, model )

We’re passing in the text from the GSP file as a template, along with a model comprised of whatever variables and mock objects the view should need.

Now I’ve got a string containing a bunch of HTML, okay, that’s the right direction but my lazy gene is not satisfied yet.

Note 1: If your template calls other templates then it makes life easier to use absolute URLs, e.g. templateUrl"/>

Note 2: This method assumed you’ll specify an explicit model for each sub-view, e.g. myObj:myObj]}"/>

From a sticky HTML mess to something useful

The easiest way to get from unparsed HTML to a useful searchable structure seems to be Groovy’s XMLSlurper. It parses XML rather than HTML by default, but you can instantiate it with a more HTML-friendly parser like TagSoup like so:

def slurper = new XmlSlurper( new org.ccil.cowan.tagsoup.Parser() )
def parsedHtml = slurper.parseText(text)

Easy.

Pulling it all together

import grails.test.GroovyPagesTestCase
import org.ccil.cowan.tagsoup.Parser

class GspXyzTests extends GroovyPagesTestCase
{
	boolean transactional = false
	def slurper = new XmlSlurper( new Parser() )

	// This test looks for a specific thing in the resulting parsed HTML
	void testSomeImportantOutput() {

		//Open file containing GSP under test
		def file = new File("grails-app/views/myfolder/template.gsp")
		assertNotNull(file)

		//Obtain result of GSP page
		def text = applyTemplate(file.text, [
			pagenumber:123,
			pagesize:10,
			someMockObject:[
				foo:"bar",
				nestedMockObject:[
					[id:12345],
					[id:67890]
				]
			]
		])

		def html = slurper.parseText( text )

		// test some aspect of the parsed structure, the trick would be to make the test resilient to a degree of cosmetic change
		assertEquals 1, html.head.meta.list().findAll{ it.@name?.text().toLowerCase() == "robots" }.size()

	}

}

Visualising the age of consent

Wednesday 10th February, 2010

In the course of my usual data-immersion session in my RSS reader of choice I came across a short but thought-provoking post by Stephen Law linking to some data on the age of consent.

Being a big fan of data visualisation I decided to have a go at representing the data in a way that can be more easily absorbed. So, armed with the source data, a list of ISO country codes, and the docs for the Google Chart API, I started playing.

The biggest question when visualising data, just like with statistics, is deciding what you’re looking for. This data is complex enough to be difficult to show in its entirity, involving maybe a dozen or so possible pieces of information for each location.

Here I’ve opted to look at the difference between the age of consent for straight couples and gay couples.

Blue indicates larger differences between straight and gay ages of consent (or illegality)

Vilified by Visa

Monday 1st February, 2010

Following on from a previous post about how the Verified By Visa and Mastercard SecureCode are training users to give up their identity to anyone who asks for it, apparently some lovely boffins at Cambridge have written a paper on it. (via)


Progress, Trust, and Going to the Pub: London Search Social write-up

Thursday 14th January, 2010
The Elgin on Ladbroke Grove

The Elgin, via flickr/Ewan-M

There’s a theory that claims expertise and associated salaries increase more rapidly in cities than they do in the country because physical proximity decreases the cost of sharing ideas (I’m desperately trying to dig up the source amongst the noise; which is either ironic or poetic depending on how you look at it).

The interwebs are a different beast. Proximity doesn’t exist in the same way, perhaps instead becoming a cultural rather than a geographic measure of separation. The cost of spreading an idea on the internet is more related to trust than physical distance.

On the internet, that trust must be earned by expertise and clear communication. In the physical world people behave differently, people tend to trust each other more quickly and be more open when they can look each other in the eye, or buy each other a beer.

[Only on a technology blog would you find a justification this obscure for going to the pub.]

On Tuesday we held this month‘s London Open Source Search Social at The Elgin in Notting Hill. This was the first time we’d used our shiny new Meetup account to organise the event, so it was nice not to have to send out reminders manually (laziness #ftw).

A few notes from the evening for those whose memories are as bad as mine

There’s plenty missing, and some of this may be fictitious.

Bruno from Jobomix talked about his use of Hadoop to detect duplicate job data, leading to a conversation about Pig and Cascade, then other distributed systems like Scala. Ben from OneIS brought up the subject of Duby, a Ruby-like-but-tidier language targeting the JVM, and when prompted gave us an outline of his company’s free-text graph store.

We talked about duplicate detection in various fields, thresholds, and the cost of false positives. We touched on human relevance testing; Richard told us he’d found people generally need to be paid to do it and not for more than 30 minutes at a time.

Joao from the Royal Library of the Netherlands told us how they digitise and index millions of pre-digital documents per month. Ben told us about a method of querying Xapian from Postgres using an SQL JOIN.


Sad to see Modista die

Monday 11th January, 2010

I should state at this point that these are personal opinions, not those of my employer.

While the first reaction when a potential competitor folds is to celebrate, that reaction is both immature and short-sighted. Not only is it a sad state of affairs that one company can destroy another using just the costs of the patent infringement process (regardless of whether the patent is valid or the infringment is demonstrable), but since competition is the fire that drives progress and innovation; seeing competition fold for any reason other than poor products is always disappointing.

Daniel Tunkelang has a eulogy of Modista over on the Noisy Channel.

Respect to AJ Shankar, Arlo Faria, and any others on the team; you guys did some impressive work.


As the browser war hots up, Google has Bing in its sights

Monday 11th January, 2010

Google Chrome advertising (via flickr/iainpurdie)

As any self-respecting nerd will have noticed, and others have already noted, Google recently started advertising its Chrome web browser on billboards and in newspapers around the UK. This represents an escalation of the second phase of the browser wars, and one of the few occasions Google has resorted to billboards to advertise a product.

Why bother advertising a free product?

The answer to why Google are advertising Chrome (which is a free download) is unsurprisingly similar to the answer to the bigger question; why bother building and supporting a free product?

Google make money by monetising user’s searches. People are great at optimising finding and using short-cuts, and modern browsers have built-in search bars. In short, more people using your search bar means more money, and Chrome (like Firefox) defaults to searching on Google.

Billboards – dated but still relevant

Let’s face it, it’s not Google’s style to put up great big billboards. It’s not smart, it’s not targeted, it’s not high-tech. However, ironically those attributes are exactly why they work in this situation.

Google’s main competitor in the search space is Microsoft (who have incidentally been advertising their search engine Bing heavily) and Microsoft’s largest user-base is the slow-moving majority who get Internet Explorer bundled with their PC. Via its default status in Internet Explorer Bing is by default used by that same slow-moving majority.

Since the majority is too big to be worth the extra cost of targeting; the common or garden billboard is a suitable way to get through to them (at the same time as reinforcing the brand with nerds who already know about it).


Follow

Get every new post delivered to your Inbox.