Lots of little things

Posted by laura on January 6, 2012 under Metadata | Be the First to Comment

Wow.  The year is only a few days old and already there’s tons of activity in the library metadata world.

First, I’m thrilled to say that the CODE4LIB preconference I’ve been involved with is a go.  Digging into metadata: context, code, and collaboration   will be a forum for coders and catalogers to continue the work begun at the “Catalogers & Coders” CURATECamp, held at the DLF Fall Forum. As you may recall,  one of the next steps which emerged from those discussions was to have future events dedicated to fostering cataloger-coder relationships.   Registration for CODE4LIB is full, and there’s a substantial waiting list.   There’s sure to be other events in the future, however, as CODE4LIB regionals continue to expand and interest groups within LIS organizations develop.  Also, we’ll be making all of the CODE4LIB pre-con materials available

Speaking of making materials available, I’ve finally put my Linked Open Data in Libraries, Archives, and Museums presentation up on slideshare.   Thanks to Emily Nimsakont for letting me borrow a few slides from one of her presentations.  Someday I’ll actually create a slidecast of this.  I think slides sans context have limited utility.    There will be another opportunity to catch me presenting live.  If you’re going to ALA mid-winter, I’ll be speaking on a panel regarding future trends in authority work for PCC/NACO.   I’ll post more details about that closer to the date.

Speaking of catalogers coding,  Shana McDanold, one of my co-conspirators on the CODE4LIB pre-con, has been doing a bang-up job promoting CodeAcademy’s Code Year within the cataloging community.   There are no more excuses for any cataloger wishing to delve into coding.  Code Year sends you an interactive lesson per week.   You can work along with many other librarians via the twitter hashtags #catcode and #libcodeyear.  There’s also a PBwiki for further collaboration.  I’m betting that the #catcode community carries on once the year is done – there’s much for us to do with the continuing evolution of catalogs,  new metadata workflows with repositories, etc.  I’ve blogged before about the blurriness in defining role boundaries between metadata librarians and programmers.  Knowing coding basics can only help us improve our relationships with programmers.   And, it’s going to lead to better services.  We’ll be better able to articulate what we’d like our programmers to do when we’re developing stuff.

Exciting times!  I’m very stoked to see the response Shana has received.  Over the years I’ve witnessed lots of catalogers who are refusing to adapt to the increasingly technical nature of our jobs (not at MPOW, fortunately.)  It seems the tide is finally changing.  I think the best thing we can do as a community is figure out projects to make use of our nascent coding skills.  No, I don’t have any ideas yet.  I’ll keep you posted on that.


DLF Linked Data Hands-on Session & 4M

Posted by laura on December 5, 2011 under Semantic web | 2 Comments to Read

As promised, I’m finally getting around to posting about the sessions at DLF Forum which were particularly awesome.   The Linked Data: Hands on How-To workshop afforded the opportunity to bring your own data and learn how to link-if-y it.  It held the promise of helping me get the Caltech faculty names linked data pilot a bit further along.   I didn’t get much further in the process due to some technical glitches.  Yet the session was still successful in a couple of ways.

First, I had another “a-ha!!” moment in terms of how Linked Data works.  All this time I’ve been preparing our faculty names data with an eye towards exposing it as Linked Data.  I realize that this build-it-and-they-will come approach is somewhat naive, but it’s a baby step in terms of getting ourselves up-to-speed on the process.  What I didn’t fully grasp was that a data exposed in this fashion is just an endpoint,  an object or subject others can point at but not really do much with.  If it’s just an endpoint, one can’t follow their nose and link on to more information on other servers. Our data will only be truly useful once it can be used to complete the full subject-predicate-object “sentence” of a triple.

In practical terms it means rather than just putting out a URI associated with a faculty name,  we should expose the faculty name URI along with other identity links and relationship links.  Let’s use our favorite Caltech faculty name as an example.  We mint a URI for Richard Feynman, let’s say http://library.caltech.edu/authorities/feynman.  We ensure the URI is  “dereferencable” by HTTP clients which means the client receives an HTML or RDF/XML URI in response to its query.  Since we only have the name identifiers in our data set, that’s all the client will receive in the returned HTML or RDF/XML document.   In this case we know Feynman has a NAF identifier http://id.loc.gov/authorities/names/n50002729.htm.

The entity working with this exposed data would have to do all the work to create links if all we exposed was was the NAF URI (and really, why wouldn’t somebody just go directly to the NAF?).  Our data on this end would be much  richer if we could make a few statements about it.  We need to expose triples.   As an example, we could create a simple triple related our URI with the NAF URI.   We connect the URIs with the OWL web ontology “same as” concept.    The triple would look like this:

<http://library.caltech.edu/authorities/feynman.html>  <http://www.w3.org/2002/07/owl#sameAs> <http://id.loc.gov/authorities/names/n50002729.htm>

The data we’re exposing is now looking much more like Linked Data.  We could go even further and start writing triples like Feynman is the creator of “Surely You’re Joking, Mr. Feynman” using a URI for the Dublin Core vocabulary term creator and a URI for the bibliographic work.   The more triples we garner, the more the machines can glean about Feynman.      It was a breakthrough for me to figure out that we need  full triples in our project rather than simply exposing a name URI.

The second way the hands-on workshop was a success for me was that  I had the opportunity to play with Google Refine.  Google Refine is spreadsheets on steroids.  It allows you to manipulate delimited data in more ways than Excel.  Free Your Metadata has some videos which explain the process for using Refine with a RDF extension to prepare subject heading linked data (yes, I’m sneaking a Monday Morning Metadata Movie into this post).   I was hoping to take my spreadsheet data of names and LCCN and get either VIAF or NAF URIs.  That would get our faculty names linked data project to the point of having 1/3rd of a triple.

Unfortunately, the I could not get the RDF plug-in installed on my laptop.  Some of my colleagues in the hands-on did manage to get it to work.  We pulled in some very knowledgeable programmers to troubleshoot and their conclusion after about an hour of tinkering was that the plug-in was buggy.    Back to the drawing board it seems.

There will be another Linked Data hands-on session at CODE4LIB 2012.  I anticipate that it will be as useful as the DLF hands-on.  I do plan on attending and I am keeping my fingers crossed that I can make some progress with our project.   There is a great list of resources on the web page for the DLF session.  There are other tools there besides Google Refine that I hope to play with before CODE4LIB.  Plus there are links to other hands-on tutorials.  Slowly but surely I’ll get my head wrapped around how to do Linked Data.  I’m grateful the DLF forum furthered my understanding.


I went to DLF and all I got was…

Posted by laura on November 4, 2011 under Metadata | Be the First to Comment

I’m now home from the DLF 2011 Forum.  This was my 2nd year and it has become my favorite conference.  I met people I wanted to meet both times I have attended.   I liked being able to express my impressed-ness  and talk shop. This year, I made connections from which some collaborations may emerge.   Exciting!

I’m back with some very useful information, stuff I can apply to my day-to-day.  The Linked Data hands-on session was awesome and merits its own blog post.  Best thing: it helped me make a bit of headway with the CIT faculty names linked data work.

CURATEcamp also gets its own blog post. The conversation between catalogers and coders made a great leap.  There are concrete next steps.  See details at the CURATEcamp wiki.  See also the pretty pretty picture of the distribution between catalogers and coders attending.  Catalogers represent!

I laughed so much.  This is truly the conference’o'mirth.  Dan Chudnov proposed a weekly call-in show where a cataloger and a coder take questions from the field.  The names proposed in the twitter back-stream made it difficult to hold back laughter for fear of disturbing the front-stream speaker.  Chuckles aside, this seems to have strong possibility of actually happening.   I’m getting over my instinct that back-chat is rude at conferences.  It seems to have a group bonding effect.  I can see the value of nurturing professional relationships through shared discussion and commentary.  See part above about concrete next steps from CURATEcamp.

I got caught up on Data Management Plans and the eXtensible Catalog project.  I asked for & received advice about migrating our archival information system.  I asked people about their use of descriptive metadata and best practices for image management.  I have new software to evaluate and test.   I also got to see a preview of an RDA training workshop which focuses on as data elements (not RDA as records!!!).

My brain hurts because it’s so full.  I’ll need to collect my thoughts.  I absorbed so much that it wouldn’t be fair to blog one big brain dump.  I’ll be able to synthesize it better if I break it down in chunks.  Stay tuned.