RDF and Freebase’s Graphd

I’ve just spent half an hour reading a fascinating article about Graphd written by Scott Meyer. Graphd is the in-house data storage system used by the social database Freebase. It’s either RDF, or a custom relative of RDF, but Scott doesn’t say.

Since (like the majority of application developers) I’ve had my head buried deep in the data storage sand of SQL and RDBMS for the last decade it comes as quite a shock to me to find a world of techniques that bears no relation to the structures I’m used to, and that some people have been doing this all along.

Rather than ignoring the elephant in the kitchen I think it’s best to pack my mosquito net, don my favourite pith helmet, and venture forth into that kitchen to find out what lurks in the biscuit tin of wisdom.

Resource Description Framework (RDF)

RDF is a system for recording data that works by allowing you to record statements about objects and their relationships. This is very different to a relational database because it becomes pretty trivial for you (or even your users) to expand your database structure. The data defines the database structure. RDF is to RDBMS what a folksonomy is to a rigid ontology.

It uses a data structure called a “triple” to record data.

Triples

A triple is a simple statement, the quantum unit of knowledge. You can’t cut it up any more without losing information. They consist of a subject, a predicate, and an object. Eg:

  • “My office (subject) is in (predicate) London (object)”
  • “Alice (subject) went into (predicate) the looking glass (object)”

The main difference between a simple list of statements and a proper Triple Store is that ambiguous names such as “Alice” are replaced by unique identifiers, either IDs or URIs. Vague non-machine-readable predicates are also replaced by defined relationships.

Huh?

But what does all this mean?

My brain tends to work by example. Concepts are all well and good, but concrete examples are king. RDF is a semantic storage technology, and in that space we have a few players. Interesting examples are people like Freebase (as mentioned above) and MusicBrainz (an open music database). These organisations are not only allowing contributors to enter data, but allowing them to define new types of data. This takes something that (in the land of the RDBMS) belongs to the product manager, architect, and DBA, and puts it into the hands of regular users.

RDF may have been around for a long time, but I don’t think we’ve seen it’s full potential yet.

Advertisements

3 Responses to RDF and Freebase’s Graphd

  1. Mark Fowler says:

    Have you had a look at CouchDB?

  2. Richard Marr says:

    Briefly, but I don’t understand it well enough to see what makes it valuable. I’m assuming there is something but I haven’t twigged it yet.

  3. Suhail Abbas says:

    You should take a look at Brainwave Platform and their Semantic Database. Which is based on RDF model and comes built in tools.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: