DLF Linked Data Hands-on Session & 4M

Posted by laura on December 5, 2011 under Semantic web | 2 Comments to Read

As promised, I’m finally getting around to posting about the sessions at DLF Forum which were particularly awesome.   The Linked Data: Hands on How-To workshop afforded the opportunity to bring your own data and learn how to link-if-y it.  It held the promise of helping me get the Caltech faculty names linked data pilot a bit further along.   I didn’t get much further in the process due to some technical glitches.  Yet the session was still successful in a couple of ways.

First, I had another “a-ha!!” moment in terms of how Linked Data works.  All this time I’ve been preparing our faculty names data with an eye towards exposing it as Linked Data.  I realize that this build-it-and-they-will come approach is somewhat naive, but it’s a baby step in terms of getting ourselves up-to-speed on the process.  What I didn’t fully grasp was that a data exposed in this fashion is just an endpoint,  an object or subject others can point at but not really do much with.  If it’s just an endpoint, one can’t follow their nose and link on to more information on other servers. Our data will only be truly useful once it can be used to complete the full subject-predicate-object “sentence” of a triple.

In practical terms it means rather than just putting out a URI associated with a faculty name,  we should expose the faculty name URI along with other identity links and relationship links.  Let’s use our favorite Caltech faculty name as an example.  We mint a URI for Richard Feynman, let’s say http://library.caltech.edu/authorities/feynman.  We ensure the URI is  “dereferencable” by HTTP clients which means the client receives an HTML or RDF/XML URI in response to its query.  Since we only have the name identifiers in our data set, that’s all the client will receive in the returned HTML or RDF/XML document.   In this case we know Feynman has a NAF identifier http://id.loc.gov/authorities/names/n50002729.htm.

The entity working with this exposed data would have to do all the work to create links if all we exposed was was the NAF URI (and really, why wouldn’t somebody just go directly to the NAF?).  Our data on this end would be much  richer if we could make a few statements about it.  We need to expose triples.   As an example, we could create a simple triple related our URI with the NAF URI.   We connect the URIs with the OWL web ontology “same as” concept.    The triple would look like this:

<http://library.caltech.edu/authorities/feynman.html>  <http://www.w3.org/2002/07/owl#sameAs> <http://id.loc.gov/authorities/names/n50002729.htm>

The data we’re exposing is now looking much more like Linked Data.  We could go even further and start writing triples like Feynman is the creator of “Surely You’re Joking, Mr. Feynman” using a URI for the Dublin Core vocabulary term creator and a URI for the bibliographic work.   The more triples we garner, the more the machines can glean about Feynman.      It was a breakthrough for me to figure out that we need  full triples in our project rather than simply exposing a name URI.

The second way the hands-on workshop was a success for me was that  I had the opportunity to play with Google Refine.  Google Refine is spreadsheets on steroids.  It allows you to manipulate delimited data in more ways than Excel.  Free Your Metadata has some videos which explain the process for using Refine with a RDF extension to prepare subject heading linked data (yes, I’m sneaking a Monday Morning Metadata Movie into this post).   I was hoping to take my spreadsheet data of names and LCCN and get either VIAF or NAF URIs.  That would get our faculty names linked data project to the point of having 1/3rd of a triple.

Unfortunately, the I could not get the RDF plug-in installed on my laptop.  Some of my colleagues in the hands-on did manage to get it to work.  We pulled in some very knowledgeable programmers to troubleshoot and their conclusion after about an hour of tinkering was that the plug-in was buggy.    Back to the drawing board it seems.

There will be another Linked Data hands-on session at CODE4LIB 2012.  I anticipate that it will be as useful as the DLF hands-on.  I do plan on attending and I am keeping my fingers crossed that I can make some progress with our project.   There is a great list of resources on the web page for the DLF session.  There are other tools there besides Google Refine that I hope to play with before CODE4LIB.  Plus there are links to other hands-on tutorials.  Slowly but surely I’ll get my head wrapped around how to do Linked Data.  I’m grateful the DLF forum furthered my understanding.

 

  • Free Your Metadata said,

    Hi Laura, Interesting post, thanks a lot!

    Could you provide some details about why you couldn’t get the RDF plugin working? Some of us are not “knowledgeable programmers” (far from that) but still managed to get the thing working. I don’t have the impression that it is buggy – that’s just what IT people say when they can’t get thing to work ;)

  • laura said,

    Hi there –
    Sorry for taking so long to respond. I’ve not had time to revisit this project in awhile.

    I was working through the steps in this tutorial http://bit.ly/rDZxTZ . For context, I’m running it on WinXP 2002 SP3.

    I think the problem had to do with file paths. For some reason, when we started up Refine and loaded some spreadsheet data, Refine wouldn’t recognize the plugins. I had it installed on a folder on my desktop and put the extensions folder in c:\Documents and Settings\\Local Settings\Application Data\Google\Refine. The I downloaded and put the RDF extension in \Refine\extensions. We tried putting the extensions in various file paths without success.

    Today I wiped everything Refine related from my laptop and started over reinstalling the application and the RDF plug-in. I managed to open Refine and create a project by importing some Excel data. The RDF and Freebase plug-ins appeared in the top right corner without a hitch this time.

    Unfortunately, when I attempted to re-open the project or create a new project the entire thing crashed. It would hang with the “working” dialog box.

    So I went directly to DERI to see if there was an update to the extension. There was. The direct link provided in the tutorial was for v.0.5.3 and they’re now up to v.0.7.0 . I’ll play around with it and see if the new version works. I’ll keep you posted.

Add A Comment