As promised, I’m finally getting around to posting about the sessions at DLF Forum which were particularly awesome. The Linked Data: Hands on How-To workshop afforded the opportunity to bring your own data and learn how to link-if-y it. It held the promise of helping me get the Caltech faculty names linked data pilot a bit further along. I didn’t get much further in the process due to some technical glitches. Yet the session was still successful in a couple of ways.
First, I had another “a-ha!!” moment in terms of how Linked Data works. All this time I’ve been preparing our faculty names data with an eye towards exposing it as Linked Data. I realize that this build-it-and-they-will come approach is somewhat naive, but it’s a baby step in terms of getting ourselves up-to-speed on the process. What I didn’t fully grasp was that a data exposed in this fashion is just an endpoint, an object or subject others can point at but not really do much with. If it’s just an endpoint, one can’t follow their nose and link on to more information on other servers. Our data will only be truly useful once it can be used to complete the full subject-predicate-object “sentence” of a triple.
In practical terms it means rather than just putting out a URI associated with a faculty name, we should expose the faculty name URI along with other identity links and relationship links. Let’s use our favorite Caltech faculty name as an example. We mint a URI for Richard Feynman, let’s say http://library.caltech.edu/authorities/feynman. We ensure the URI is “dereferencable” by HTTP clients which means the client receives an HTML or RDF/XML URI in response to its query. Since we only have the name identifiers in our data set, that’s all the client will receive in the returned HTML or RDF/XML document. In this case we know Feynman has a NAF identifier http://id.loc.gov/authorities/names/n50002729.htm.
The entity working with this exposed data would have to do all the work to create links if all we exposed was was the NAF URI (and really, why wouldn’t somebody just go directly to the NAF?). Our data on this end would be much richer if we could make a few statements about it. We need to expose triples. As an example, we could create a simple triple related our URI with the NAF URI. We connect the URIs with the OWL web ontology “same as” concept. The triple would look like this:
<http://library.caltech.edu/authorities/feynman.html> <http://www.w3.org/2002/07/owl#sameAs> <http://id.loc.gov/authorities/names/n50002729.htm>
The data we’re exposing is now looking much more like Linked Data. We could go even further and start writing triples like Feynman is the creator of “Surely You’re Joking, Mr. Feynman” using a URI for the Dublin Core vocabulary term creator and a URI for the bibliographic work. The more triples we garner, the more the machines can glean about Feynman. It was a breakthrough for me to figure out that we need full triples in our project rather than simply exposing a name URI.
The second way the hands-on workshop was a success for me was that I had the opportunity to play with Google Refine. Google Refine is spreadsheets on steroids. It allows you to manipulate delimited data in more ways than Excel. Free Your Metadata has some videos which explain the process for using Refine with a RDF extension to prepare subject heading linked data (yes, I’m sneaking a Monday Morning Metadata Movie into this post). I was hoping to take my spreadsheet data of names and LCCN and get either VIAF or NAF URIs. That would get our faculty names linked data project to the point of having 1/3rd of a triple.
Unfortunately, the I could not get the RDF plug-in installed on my laptop. Some of my colleagues in the hands-on did manage to get it to work. We pulled in some very knowledgeable programmers to troubleshoot and their conclusion after about an hour of tinkering was that the plug-in was buggy. Back to the drawing board it seems.
There will be another Linked Data hands-on session at CODE4LIB 2012. I anticipate that it will be as useful as the DLF hands-on. I do plan on attending and I am keeping my fingers crossed that I can make some progress with our project. There is a great list of resources on the web page for the DLF session. There are other tools there besides Google Refine that I hope to play with before CODE4LIB. Plus there are links to other hands-on tutorials. Slowly but surely I’ll get my head wrapped around how to do Linked Data. I’m grateful the DLF forum furthered my understanding.