It’s Monday. It’s time for a metadata movie. There’s been a lot going on lately re: Library LOD. I’d have been posting on it but, well, you know how it goes. I’m excited about the developments. I’m itching to resume work on our own faculty linked names pilot project. I’m almost caught up with my post-leave in-box and should be returning to that soon. Meanwhile, grab some popcorn and enjoy.
The Maryland Library Association has posted links to all of the presentations from the “Technical Services on the Edge” program held in December, where I spoke about Linked Data in Libraries, Archives, and Museum. The copy of my slides here contains the speakers notes. It may prove more helpful than the slides-only which I posted to slideshare.
Wow. The year is only a few days old and already there’s tons of activity in the library metadata world.
First, I’m thrilled to say that the CODE4LIB preconference I’ve been involved with is a go. Digging into metadata: context, code, and collaboration will be a forum for coders and catalogers to continue the work begun at the “Catalogers & Coders” CURATECamp, held at the DLF Fall Forum. As you may recall, one of the next steps which emerged from those discussions was to have future events dedicated to fostering cataloger-coder relationships. Registration for CODE4LIB is full, and there’s a substantial waiting list. There’s sure to be other events in the future, however, as CODE4LIB regionals continue to expand and interest groups within LIS organizations develop. Also, we’ll be making all of the CODE4LIB pre-con materials available
Speaking of making materials available, I’ve finally put my Linked Open Data in Libraries, Archives, and Museums presentation up on slideshare. Thanks to Emily Nimsakont for letting me borrow a few slides from one of her presentations. Someday I’ll actually create a slidecast of this. I think slides sans context have limited utility. There will be another opportunity to catch me presenting live. If you’re going to ALA mid-winter, I’ll be speaking on a panel regarding future trends in authority work for PCC/NACO. I’ll post more details about that closer to the date.
Speaking of catalogers coding, Shana McDanold, one of my co-conspirators on the CODE4LIB pre-con, has been doing a bang-up job promoting CodeAcademy’s Code Year within the cataloging community. There are no more excuses for any cataloger wishing to delve into coding. Code Year sends you an interactive lesson per week. You can work along with many other librarians via the twitter hashtags #catcode and #libcodeyear. There’s also a PBwiki for further collaboration. I’m betting that the #catcode community carries on once the year is done – there’s much for us to do with the continuing evolution of catalogs, new metadata workflows with repositories, etc. I’ve blogged before about the blurriness in defining role boundaries between metadata librarians and programmers. Knowing coding basics can only help us improve our relationships with programmers. And, it’s going to lead to better services. We’ll be better able to articulate what we’d like our programmers to do when we’re developing stuff.
Exciting times! I’m very stoked to see the response Shana has received. Over the years I’ve witnessed lots of catalogers who are refusing to adapt to the increasingly technical nature of our jobs (not at MPOW, fortunately.) It seems the tide is finally changing. I think the best thing we can do as a community is figure out projects to make use of our nascent coding skills. No, I don’t have any ideas yet. I’ll keep you posted on that.
It’s a sad day in the library development world. Rurik Greenall, the kick-ass Linked Data developer at the Norwegian University of Science & Technology Library, has announced his intention to leave libraryland and work in industry where there’s more hope of doing great things with Linked Open Data. He writes that there is no real need for Linked Data in libraries due to the if-it-ain’t-broke-why-fix-it phenomena.
He’s absolutely correct. I’ve said it before. There’s little reason for most academic libraries to expose traditional bibliographic information as linked data. There really isn’t any reason to use Linked Data within the context of how libraries currently operate. Our systems allow us to do the job of purchasing resources, making them searchable for our customers, and circulating them to people. In harsh economic times, why spend time/energy/money to change if things are working?
He’s also incorrect. There are reasons for librarians to do Linked Data. I suspect Rurik knows this and his tongue is implanted in his cheek due to frustration with the glacial pace of change in the Library systems world. Yes, there’s no reason to change if things are working. But things won’t always work the way they do now. We’re like candle makers after electricity has been harnessed. People still use candles but not as their sole source of light. The candle makers that are still in business pursued other avenues. Other use cases for candles besides “source of light” became prominent. Think of aromatherapy (scented candles), religious worship (votive candles), or decoration. It will be the same for library catalogs. People will always use them, but not as their main source of bibliographic descriptions. The traditional catalog data will be used in other ways. In my opinion, its future job will be as a source of local holdings and shared collection management Linked Data.
It’s quite telling that when Rurik asked, “what are the objectives of linked data in libraries” prior to the LOD-LAM summit and heard the crickets chirping. The cataloging world has failed profoundly at understanding our raison d’être. I think we’ve tied ourselves too much to Panizzi’s & Lubetzky’s purpose of the catalog (explicating the differences between different expressions/manifestations of works) and lost sight of the purpose of providing a catalog in the first place — connecting people with information. Our work should be focused on assisting others in their information seeking & use rather than focused on managing local inventories. The FRBR user tasks (find, identify, select, obtain) don’t cover the full spectrum of information behavior in the 21st century. People want to analyze, synthesize, re-use, re-mix, highlight, compare, correlate, and so on and so forth. Linked Data is the enabling technology which will allow these new types of information behavior. The use case of libraries providing catalogs of descriptive bibliographic records for discrete objects is becoming increasingly marginal.
So I’ll propose an answer to Rurik’s question. The objective of doing Linked Data in libraries is to facilitate unforeseen modes of information use. How does this translate into new use cases for how libraries operate? Perhaps it means creating better systems for information seeking (we’d better hurry though. Google is kicking our ass at this…). Perhaps, as I believe, it means focusing more on helping our customers as producers of information rather than consumers. Putting legacy library bibliographic data into a Linked Data form is but one small first step in the process. Once it’s out there in Linked Data form, it’s more amenable to the analyzing, synthesizing, re-using, re-mixing, highlight, comparing and correlating because we can now sic the machines on it. Putting legacy bibliographic data into Linked Data form is how we’re going to learn how to do Linked Data. Rurik is right that Linked Data in libraries will not work if this is all that we do. We need to take additional steps and figure out how to do Linked Data in a way that makes the most sense for our customers.
Rurik worked in the trenches to bring Linked Data into the library world. I’ve often referred to his work as I struggle, mightily, to teach myself how-to expose our Linked Data and use the Linked Data exposed by others. The library world needs more people who can help librarians bridge the gap between how we currently do business and how we need to do business if we hope to keep our jobs. I begin to feel like we’re on the Titanic when these sailors jump ship. I will seek the life-boat and continue learning the skills I need to help my library’s customers with their information seeking & use.
For the past few months I’ve been writing about our pilot Linked Open Data project to expose name identifiers for Caltech’s current faculty. So far we’ve been working on creating/obtaining the initial data set. I’m very happy to report that we’ve now got 372/412 faculty names in the LC/NAF and, by extension, the VIAF. We expect to complete the set within the next month or so, give or take our other production responsibilities. Meanwhile, I’ve been messing with the metadata so I can figure out what the heck to do next.
We have the data in full MARC21 authority records. From that I’ve made a set of MADS records (thanks MarcEdit XSLT!). I also created a tab delimited .txt file of the name heading and LCCN.
According to the basic linked data publishing pattern, it can be as simple as publishing a web page. We are able to put out the structured data under an open license and in a non-proprietary format and call ourselves done. This is what you would call 3 star linked open data. We’d like to do a bit better than that. In order to achieve 4 star Linked Open Data we need to do stuff like mint URIs and make the data available in forms more readily machine process-able.
This is where I get stuck.
Fortunately, there will be a hands-on linked data workshop at the DLF Forum (#dlfforum) next week. I’m highly looking forward to it. I’ve volunteered to give an overview of linked data to Maryland tech services librarians in December. Getting our data out there will provide some necessary street cred.
I’m back from the International Linked Open Data in Libraries Archives and Museums Summit which was held June 2-3, 2011 San Francisco, CA, USA. My brain is still digesting all that I learned. I’ve posted my rough notes in case anybody else can find them useful. I thought the event was really well done. I learned about several LOD projects which might provide tools we can use here at Caltech. I’ve got to dig in a bit more detail. The organizers will be posting a summit report with all of the action items for next steps. Of particular interest to me – there will be some work on publishing citations as linked data and there will be some materials released which can assist with educating the LAM community about LOD. I’ll probably write about each aspect as more information comes to light and as I wrap my head around it.
Re: my post yesterday saying I was unsure if the VIAF text mining approach to incorporating Wikipedia links within their records was Linked Data. There’s a good little conversation over at the LOD-LAM blog which elucidated the difference for me. They say it better than I can, so go have a look-see. The money quotation, ” Linked Data favors “factoids” like date-and-place-of-birth, while statistical text-mining produces (at least in this case) distributions interpretable as “relationship strength”.”
There’s been some conversation lately about using Wikipedia in authority work. Jonathan Rochkind recently blogged about about the potential of using Wikipedia Miner to do add subject authority information to catalog records, reasoning that the context and linkages provided in a Wikipedia article could provide better topical relevance. Then somebody on CODE4LIB asked for help using author information in a catalog record to look up said author in Wikipedia on-the-fly. The various approaches suggested on the list have been interesting although there hasn’t been an optimal solution. Although I couldn’t necessarily code such an application myself, it’s good to know how a programmer could go about doing such a thing. What I did learn was that Wikipedia has a way of marking up names with author identifiers. The Template:Authority Control gives an example of how to do it.
I haven’t done much authoring or editing at Wikipedia, so the existence of the “template” is news to me. I think it’s pretty nifty, so I just had to blog it. The template gets me thinking. Perhaps we’ll be able to leverage our faculty names Linked Data pilot into some sort of mash-up with Wikipedia, pushing our author identifiers into that space or pulling Wikipedia info into our work. Our group continues to make progress on getting all our current faculty are represented in the National Authority File, with an eye to exposing our set of authority records as Linked Data. We haven’t figured out yet precisely what we’re going to do with the Linked Data once we make it available. Build it and they will come is nice, but we need a demonstrable benefit (i.e. a cool project) to show the value of the Library’s author services.
VIAF already provides external links to Wikipedia and WorldCat Identities with its display of an author name. Ralph Levan explained how OCLC did it, in general fashion, in the CODE4LIB conversation. Near as I understand it, they do a data dump from Wikipedia, do some text mining, run their disambiguation algorithms over it, then add the Wikipedia page if they get a match. I don’t know if this computational approach is a Linked Data type of thing or not. I need to continue working my way through chapter 5 & chapter 6 Heath & Bizer’s Linked Data book (LOD-LAM prep!). Nonetheless, it’s a good way of showing how connections can be built between an author identity tool and another data source which enrich the final product. I have a hazy vision of morphing the Open Library’s “one web page for every book every published” into “one web page for every Caltech author.” More likely it will be “one web page tool for every Caltech author to incorporate into their personal web site,” given the extreme individualism and independence cherished within our institutional culture. But I digress. Yes. “One web page for every Caltech author” would at least give us the (metaphorical) space to build a killer app.
Another librarian has seen the Linked Data light. Mita Williams, the New Jack Librarian, writes about gaining a new appreciation for LOD at the recent Great Lakes THAT camp. Her take-away seems similar to my understanding: librarians already know how to created Linked Data. We need to see the application of the Linked Data in new contexts in order to comprehend the utility of exposing the data. The tricky bit IMHO is that creating applications to use the data requires a SPARQL end point. These SPARQL end points aren’t geared for humans. They are a “machine-friendly interface towards a knowledge base.”
I think the machine application layer of Linked Data is where librarians hit a barrier when getting involved with Linked Open Data (LOD). I don’t have the first clue how to set up a SPARQL end point. My technical expertise isn’t there and I’m sure there are a lot of people in the same boat (CODE4LIBers notwithstanding). Most of the stuff I’ve read about getting libraries more involved in LOD has focused on explaining how RDF is done in subject predicate object syntax then urging libraries to get their metadata transformed into RDF. I’ve seen precious little plain English instruction on building an app with Linked Data. I have seen great demos on nifty things done by people in library-land. I’ll give a shout out here to John Mark Ockerbloom and his use of id.loc.gov to enhance the Online Books Page. John Mark Ockerbloom has a PhD in computer science. How do the rest of us get there?
Personally, I’m working with the fine folks here to get our metadata in a ready to use Linked Data format. And I’m plowing through the jargon laden documentation to teach myself next steps. Jon Voss, LOD-LAM summit organizer, has posted a reading list to help and soliciting contributions. The first title I’m delving into is Heath & Bizer’s Linked Data: Evolving the Web into a Global Data Space which has a free HTML version available. They include a groovy little diagram which outlines the steps in the process of “getting there.” I’m heartened to see that our 1st step (getting the data ready) reflects the 1st step in the diagram.
As promised, here is the link to the LOD-LAM participants group http://lod-lam.net/summit/participants/ . Each of the attendees has a brief biography, including yours truly. I’m even more pleased. Congratulations to the organizers for putting the “international” into International Linked Data summit… The detailed raw data list shows from where everybody hails. It’s heavy on English speaking countries, esp. the U.S. It’s not surprising that the bulk of attendees are North American given the cost of flying over the oceans. It’s great that so many people are there representing speakers of other languages.
Off now to review some authority records. I have a rare day sans meetings which means I get to get a lot of work-work (vs. work about work) done.