RDF and Freebase’s Graphd
Tuesday 6th May, 2008I’ve just spent half an hour reading a fascinating article about Graphd written by Scott Meyer. Graphd is the in-house data storage system used by the social database Freebase. It’s either RDF, or a custom relative of RDF, but Scott doesn’t say.
Since (like the majority of the application developers) I’ve had my head buried deep in the data storage sand of SQL and RDBMS for the last decade it comes as quite a shock to me to find a world of techniques that bears no relation to the structures I’m used to, and that some people have been doing this all along.
Rather than ignoring the elephant in the kitchen I think it’s best to pack my mosquito net, don my favourite pith helmet, and venture forth into that kitchen to find out what lurks in the biscuit tin of wisdom.
Resource Description Framework (RDF)
RDF is a system for recording data that works by allowing you to record statements about objects and their relationships. This is very different to a relational database because it becomes pretty trivial for you (or even your users) to expand your database structure. The data defines the database structure. RDF is to RDBMS what a folksonomy is to a rigid ontology.
It uses a data structure called a “triple” to record data.
Triples
A triple is a simple statement, the quantum unit of knowledge. You can’t cut it up any more without losing information. They consist of a subject, a predicate, and an object. Eg:
- “My office (subject) is in (predicate) London (object)”
- “Alice (subject) went into (predicate) the looking glass (object)”
The main difference between a simple list of statements and a proper Triple Store is that ambiguous names such as “Alice” are replaced by unique identifiers, either IDs or URIs.
Huh?
But what does all this mean?
My brain tends to work by example. Concepts are all well and good, but concrete examples are king. RDF is a semantic storage technology, and in that space we have a few players. Interesting examples are people like Freebase (as mentioned above) and MusicBrainz (an open music database). These organisations are not only allowing contributors to enter data, but allowing them to define new types of data. In the land of the RDBMS this takes something that used to be under the control of product manager, architect, and DBA, and puts it in the hands of regular users.
RDF may have been around for a long time, but I don’t think we’ve seen it’s full potential yet.
Posted by Richard Marr