Re: my post yesterday saying I was unsure if the VIAF text mining approach to incorporating Wikipedia links within their records was Linked Data. There’s a good little conversation over at the LOD-LAM blog which elucidated the difference for me. They say it better than I can, so go have a look-see. The money quotation, ” Linked Data favors “factoids” like date-and-place-of-birth, while statistical text-mining produces (at least in this case) distributions interpretable as “relationship strength”.”
There’s been some conversation lately about using Wikipedia in authority work. Jonathan Rochkind recently blogged about about the potential of using Wikipedia Miner to do add subject authority information to catalog records, reasoning that the context and linkages provided in a Wikipedia article could provide better topical relevance. Then somebody on CODE4LIB asked for help using author information in a catalog record to look up said author in Wikipedia on-the-fly. The various approaches suggested on the list have been interesting although there hasn’t been an optimal solution. Although I couldn’t necessarily code such an application myself, it’s good to know how a programmer could go about doing such a thing. What I did learn was that Wikipedia has a way of marking up names with author identifiers. The Template:Authority Control gives an example of how to do it.
I haven’t done much authoring or editing at Wikipedia, so the existence of the “template” is news to me. I think it’s pretty nifty, so I just had to blog it. The template gets me thinking. Perhaps we’ll be able to leverage our faculty names Linked Data pilot into some sort of mash-up with Wikipedia, pushing our author identifiers into that space or pulling Wikipedia info into our work. Our group continues to make progress on getting all our current faculty are represented in the National Authority File, with an eye to exposing our set of authority records as Linked Data. We haven’t figured out yet precisely what we’re going to do with the Linked Data once we make it available. Build it and they will come is nice, but we need a demonstrable benefit (i.e. a cool project) to show the value of the Library’s author services.
VIAF already provides external links to Wikipedia and WorldCat Identities with its display of an author name. Ralph Levan explained how OCLC did it, in general fashion, in the CODE4LIB conversation. Near as I understand it, they do a data dump from Wikipedia, do some text mining, run their disambiguation algorithms over it, then add the Wikipedia page if they get a match. I don’t know if this computational approach is a Linked Data type of thing or not. I need to continue working my way through chapter 5 & chapter 6 Heath & Bizer’s Linked Data book (LOD-LAM prep!). Nonetheless, it’s a good way of showing how connections can be built between an author identity tool and another data source which enrich the final product. I have a hazy vision of morphing the Open Library’s “one web page for every book every published” into “one web page for every Caltech author.” More likely it will be “one web page tool for every Caltech author to incorporate into their personal web site,” given the extreme individualism and independence cherished within our institutional culture. But I digress. Yes. “One web page for every Caltech author” would at least give us the (metaphorical) space to build a killer app.
Another librarian has seen the Linked Data light. Mita Williams, the New Jack Librarian, writes about gaining a new appreciation for LOD at the recent Great Lakes THAT camp. Her take-away seems similar to my understanding: librarians already know how to created Linked Data. We need to see the application of the Linked Data in new contexts in order to comprehend the utility of exposing the data. The tricky bit IMHO is that creating applications to use the data requires a SPARQL end point. These SPARQL end points aren’t geared for humans. They are a “machine-friendly interface towards a knowledge base.”
I think the machine application layer of Linked Data is where librarians hit a barrier when getting involved with Linked Open Data (LOD). I don’t have the first clue how to set up a SPARQL end point. My technical expertise isn’t there and I’m sure there are a lot of people in the same boat (CODE4LIBers notwithstanding). Most of the stuff I’ve read about getting libraries more involved in LOD has focused on explaining how RDF is done in subject predicate object syntax then urging libraries to get their metadata transformed into RDF. I’ve seen precious little plain English instruction on building an app with Linked Data. I have seen great demos on nifty things done by people in library-land. I’ll give a shout out here to John Mark Ockerbloom and his use of id.loc.gov to enhance the Online Books Page. John Mark Ockerbloom has a PhD in computer science. How do the rest of us get there?
Personally, I’m working with the fine folks here to get our metadata in a ready to use Linked Data format. And I’m plowing through the jargon laden documentation to teach myself next steps. Jon Voss, LOD-LAM summit organizer, has posted a reading list to help and soliciting contributions. The first title I’m delving into is Heath & Bizer’s Linked Data: Evolving the Web into a Global Data Space which has a free HTML version available. They include a groovy little diagram which outlines the steps in the process of “getting there.” I’m heartened to see that our 1st step (getting the data ready) reflects the 1st step in the diagram.