I’ve just spent half an hour reading a fascinating article about Graphd written by Scott Meyer. Graphd is the in-house data storage system used by the social database Freebase. It’s either RDF, or a custom relative of RDF, but Scott doesn’t say.
Since (like the majority of application developers) I’ve had my head buried deep in the data storage sand of SQL and RDBMS for the last decade it comes as quite a shock to me to find a world of techniques that bears no relation to the structures I’m used to, and that some people have been doing this all along.
Rather than ignoring the elephant in the kitchen I think it’s best to pack my mosquito net, don my favourite pith helmet, and venture forth into that kitchen to find out what lurks in the biscuit tin of wisdom.
Resource Description Framework (RDF)
RDF is a system for recording data that works by allowing you to record statements about objects and their relationships. This is very different to a relational database because it becomes pretty trivial for you (or even your users) to expand your database structure. The data defines the database structure. RDF is to RDBMS what a folksonomy is to a rigid ontology.
It uses a data structure called a “triple” to record data.
A triple is a simple statement, the quantum unit of knowledge. You can’t cut it up any more without losing information. They consist of a subject, a predicate, and an object. Eg:
- “My office (subject) is in (predicate) London (object)”
- “Alice (subject) went into (predicate) the looking glass (object)”
The main difference between a simple list of statements and a proper Triple Store is that ambiguous names such as “Alice” are replaced by unique identifiers, either IDs or URIs. Vague non-machine-readable predicates are also replaced by defined relationships.
But what does all this mean?
My brain tends to work by example. Concepts are all well and good, but concrete examples are king. RDF is a semantic storage technology, and in that space we have a few players. Interesting examples are people like Freebase (as mentioned above) and MusicBrainz (an open music database). These organisations are not only allowing contributors to enter data, but allowing them to define new types of data. This takes something that (in the land of the RDBMS) belongs to the product manager, architect, and DBA, and puts it into the hands of regular users.
RDF may have been around for a long time, but I don’t think we’ve seen it’s full potential yet.