There have been a great many updates on the LODLAM front (Linked Open Data in Libraries, Archives, & Museums). I haven’t blogged about it as those life things continue to be kicking my a**. My health hasn’t been good blah blah blah. I highly encourage those of you with an ongoing interest to follow the goings on at the 2nd International LODLAM Summit. I attended the 1st, which was fabulous btw, and learned tons. Two years later projects are further along and new projects are being launched. Take a gander at the design patterns repository Richard Urban announced, for example.
Thanks for the inquiries into my well being. Rest assured, things will right themselves eventually. Or not.
If When I return to writing here, you’ll know it’s better.
Time again for the Monday metadata movie. No muffins this week. Gluten sensitivity has put the breaks on that.
This one made me chuckle. Well played Roy Tennant, well played.
Cataloging Unchained: http://www.youtube.com/watch?v=IQRHNdw2_yw
ed. note – There’s been a lot going on since my return from leave. Bereavement and other extenuating circumstances continue to preclude regular posting. I anticipate it’s going to be awhile longer before I return to it. Meanwhile, I have some drafts in my pending box. This blog has been bereft of posts so I’ve begun the slow process of revision and posting. I wrote the following a year ago. It’s a bit dated, due to the Harvard reference. I think the rest of it continues to apply.
The outcry regarding the Harvard Library’s restructuring brings to light the vulnerability of traditional technical services librarians.
I received a mandate to turn metadata services into public services when I began my current job. Lets leave aside the implied notion that metadata services isn’t a public service. We do our jobs so well that people tend to forget that without us the ILS wouldn’t function effectively and we couldn’t serve our customers. But I digress. Let’s re-frame and say we have a mandate to turn metadata services into *more visible* public services.
So what do we mean by that? Obviously, the next-gen metadata services are not the standard metadata services we’ve grown accustomed to in academic libraries (i.e descriptive cataloging in MARC for our ILS, in DC for our repositories, DACS for archives, authority work, holdings work, etc.). These traditional services are not going away, let’s get that clear. Original cataloging will remain a large part of the work of huge libraries with lots of original monographs. The rest of us will keep doing traditional acquisitions, copy cataloging, etc. We must continue providing these services but they will take a lesser and lesser role. You understand the trend towards automating as much as possible in cataloging if you’ve been awake for the past 15 years. Observe academic libraries obtaining more and more electronic resources with record sets we manipulate in batch. This is what the FBI calls a clue. As we automate the old functions, we make room for doing the new functions of a Metadata Services Group.
So what is a next gen metadata service? We can look to the likes of MIT and Cornell and Stanford for guidance. They’ve been at the forefront of revamping cataloging departments and applying metadata skills in new ways. We can also do environmental scanning of the higher education environment, figuring out what our customers are doing and what they’re likely to need in the next few years. Don’t rely on asking them what they want. Remember Ford’s adage that if he’d done what his customers wanted, he’d be manufacturing horse buggies. Sometimes people don’t know what they can use. When we poke around academe, we know that managing the products of scholarly communication will figure largely in our future. We can also do environmental scanning of the technological environment. From that, we know that linking the various products of scholarly communication is important, along with managing the various metrics which are emerging . We can figure out a next-gen metadata service if we don’t panic and apply some inductive reasoning to the situation. Brainstorming a bit, here’s what I’d consider a “metadata service”
Metadata services offered:
- Search engine optimization, general & academic
- Metadata mark-up (ex. linked data in ePub, enhancing the full text in your repositories with things like chemical or mathematical mark-up )
- Ontology development
- Persistent identifier management, especially personal and organizational name identifiers
- Citation metrics, alt-metrics, bibliometrics, webliometrics, scientometrics
- Schema development and storage
These services, with the exception of consulting, seem to be machine-centric back-end things. Yet, they revolve around humans. Context is critical to do the back-end work successfully. Who are you optimizing search results for? Who uses the vocabularies you develop and maintain? People need to be identified, whether personally or organizationally. Perhaps all of these services could be considered sub-services of consulting. One needs to do the market research, user needs assessment, and customer engagement prior to service development. And it’s critical to continue engaging customers to evaluate the effectiveness of services and refine accordingly.
One point of tension I forsee – consulting directly with customers steps on the turf of public service librarians, who, in my experience, are understandable threatened. Do public service folk garner more metadata/technical prowess or do metadata/technical people work on their outreach and engagement skills? As with most things, the answer probably lies somewhere in the middle. It’s going to depend on the staffing within the library organization. Not everybody is going to be willing or able to change. That’s ok. Those that can’t do it, however, should realize that the library work isn’t for them and move on to other things. A colleague has the following quotation on his office white board, which summarizes this perfectly:
“If you don’t like change, you’re going to like irrelevance even less”- General Eric Shinseki
I’m stoked about the work I’ve been doing lately. I finally get to do some hands-on production work with metadata standards beyond AACR2/RDA/MARC, DC, and EAD. I taught myself XML in the late 90s and I’ve spent about a decade of avidly following the progress of METS, MODS, PREMIS, without any practical application. Learning in a vacuum sucks. Getting neck-deep into a project gives me a firmer grasp of concepts.
We’re currently re-vamping our Archives systems architecture. This has involved a year of analyzing workflow and current system functions, scoping our the functional requirements of what we need our systems to do, and evaluating the various solutions available. Our integrated archival management system is a bespoke FileMaker Pro database (well, actually several databases). It ties together patron management, financial management, archival description/EAD generation, digital object management, and web/interface layer. It worked well for us, but our system is 20 years old and won’t scale to handle more complex digital archiving. Ideally a new system for Archives would be as “integrated,” as our custom developed system. Unfortunately, no such Integrated Archival System exists.
We decided to go with ArchivesSpace for archival description, Aeon for patron/financial management, and Fedora/Islandora for our digital asset management system (DAMS). A unified interface layer wasn’t a critical component for us in the near-term. We figure we can do something after the other components are implemented. The strategic goal for us was to create an architecture with what I call the four systems virtues: extensibility, interoperability, portability, and scalability. We wanted to future-proof ourselves as much as possible.
We’re implementing Fedora/Islandora in our first phase. We’re starting by migrating our small collection of 10,000 digitized images from FileMaker Pro to Islandora, with the help of the folks from Discovery Garden. We’re in the metadata mapping stage, making decisions on schema structure and indexing/searching/display functions. We’re considering using a modified MODS schema with some local and VRA Core elements. I’ve need to quickly climb a relatively steep learning curve. First, I don’t have a detailed knowledge of data content standards for cataloging images. It’s not a medium one typically handles in science & engineering. I’ve been reacquainting myself with Cataloging Cultural Objects (CCO) so I can help our archivists with descriptive data entry. A knowledge of data content should inform choices of data structure and format (i.e. which elements of MODS and VRA Core to have in our schema). Second, I’m learning the Fedora “digital object” model and how it relates to Islandora functionality. Digital objects in this context is not digital objects in the librarian sense, a label for born-digital/digitized content. Third, I’m simultaneously considering the crosswalk between our legacy records and MODS while specifying our future image descriptive cataloging needs.
My biggest philosophical brain effort right now is figuring out how to implement best practice in image cataloging within Islandora/Fedora. Per CCO/VRA Core, there should be a clear distinction between the work and the image (analogous to FRBR work and expression/manifestation). In theory, this is can be done by having two records and relating them via a wrapper metadata like METS, or by using “related item” elements within the descriptive metadata schema. In practice, I simply don’t know how Islandora can manage it. Fedora was made for this type of thing, fortunately, so I assume that it’s possible. Obviously I’ll be looking at the work of others to inform our choices in the metadata structure.
Fortunately, I consider this fun.
It’s Monday. It’s time for a metadata movie. There’s been a lot going on lately re: Library LOD. I’d have been posting on it but, well, you know how it goes. I’m excited about the developments. I’m itching to resume work on our own faculty linked names pilot project. I’m almost caught up with my post-leave in-box and should be returning to that soon. Meanwhile, grab some popcorn and enjoy.
Milagros Valdes, my beautiful wife of 11 years, lost her battle with a rare and vicious form of breast cancer on April 15, 2012. I want to thank all of my colleagues at Caltech for their understanding and support during this most difficult time. I really do have the best job and coworkers imaginable. I also want to thank my professional friends who sent so many kind messages and care packages. Your support helped me and my family enormously. I surely could not have coped without my community. Many hugs are coming your way next time we see one another at meetings/conferences. I am deeply humble and grateful.
The Maryland Library Association has posted links to all of the presentations from the “Technical Services on the Edge” program held in December, where I spoke about Linked Data in Libraries, Archives, and Museum. The copy of my slides here contains the speakers notes. It may prove more helpful than the slides-only which I posted to slideshare.
Wow. The year is only a few days old and already there’s tons of activity in the library metadata world.
First, I’m thrilled to say that the CODE4LIB preconference I’ve been involved with is a go. Digging into metadata: context, code, and collaboration will be a forum for coders and catalogers to continue the work begun at the “Catalogers & Coders” CURATECamp, held at the DLF Fall Forum. As you may recall, one of the next steps which emerged from those discussions was to have future events dedicated to fostering cataloger-coder relationships. Registration for CODE4LIB is full, and there’s a substantial waiting list. There’s sure to be other events in the future, however, as CODE4LIB regionals continue to expand and interest groups within LIS organizations develop. Also, we’ll be making all of the CODE4LIB pre-con materials available
Speaking of making materials available, I’ve finally put my Linked Open Data in Libraries, Archives, and Museums presentation up on slideshare. Thanks to Emily Nimsakont for letting me borrow a few slides from one of her presentations. Someday I’ll actually create a slidecast of this. I think slides sans context have limited utility. There will be another opportunity to catch me presenting live. If you’re going to ALA mid-winter, I’ll be speaking on a panel regarding future trends in authority work for PCC/NACO. I’ll post more details about that closer to the date.
Speaking of catalogers coding, Shana McDanold, one of my co-conspirators on the CODE4LIB pre-con, has been doing a bang-up job promoting CodeAcademy’s Code Year within the cataloging community. There are no more excuses for any cataloger wishing to delve into coding. Code Year sends you an interactive lesson per week. You can work along with many other librarians via the twitter hashtags #catcode and #libcodeyear. There’s also a PBwiki for further collaboration. I’m betting that the #catcode community carries on once the year is done – there’s much for us to do with the continuing evolution of catalogs, new metadata workflows with repositories, etc. I’ve blogged before about the blurriness in defining role boundaries between metadata librarians and programmers. Knowing coding basics can only help us improve our relationships with programmers. And, it’s going to lead to better services. We’ll be better able to articulate what we’d like our programmers to do when we’re developing stuff.
Exciting times! I’m very stoked to see the response Shana has received. Over the years I’ve witnessed lots of catalogers who are refusing to adapt to the increasingly technical nature of our jobs (not at MPOW, fortunately.) It seems the tide is finally changing. I think the best thing we can do as a community is figure out projects to make use of our nascent coding skills. No, I don’t have any ideas yet. I’ll keep you posted on that.
As promised, I’m finally getting around to posting about the sessions at DLF Forum which were particularly awesome. The Linked Data: Hands on How-To workshop afforded the opportunity to bring your own data and learn how to link-if-y it. It held the promise of helping me get the Caltech faculty names linked data pilot a bit further along. I didn’t get much further in the process due to some technical glitches. Yet the session was still successful in a couple of ways.
First, I had another “a-ha!!” moment in terms of how Linked Data works. All this time I’ve been preparing our faculty names data with an eye towards exposing it as Linked Data. I realize that this build-it-and-they-will come approach is somewhat naive, but it’s a baby step in terms of getting ourselves up-to-speed on the process. What I didn’t fully grasp was that a data exposed in this fashion is just an endpoint, an object or subject others can point at but not really do much with. If it’s just an endpoint, one can’t follow their nose and link on to more information on other servers. Our data will only be truly useful once it can be used to complete the full subject-predicate-object “sentence” of a triple.
In practical terms it means rather than just putting out a URI associated with a faculty name, we should expose the faculty name URI along with other identity links and relationship links. Let’s use our favorite Caltech faculty name as an example. We mint a URI for Richard Feynman, let’s say http://library.caltech.edu/authorities/feynman. We ensure the URI is “dereferencable” by HTTP clients which means the client receives an HTML or RDF/XML URI in response to its query. Since we only have the name identifiers in our data set, that’s all the client will receive in the returned HTML or RDF/XML document. In this case we know Feynman has a NAF identifier http://id.loc.gov/authorities/names/n50002729.htm.
The entity working with this exposed data would have to do all the work to create links if all we exposed was was the NAF URI (and really, why wouldn’t somebody just go directly to the NAF?). Our data on this end would be much richer if we could make a few statements about it. We need to expose triples. As an example, we could create a simple triple related our URI with the NAF URI. We connect the URIs with the OWL web ontology “same as” concept. The triple would look like this:
<http://library.caltech.edu/authorities/feynman.html> <http://www.w3.org/2002/07/owl#sameAs> <http://id.loc.gov/authorities/names/n50002729.htm>
The data we’re exposing is now looking much more like Linked Data. We could go even further and start writing triples like Feynman is the creator of “Surely You’re Joking, Mr. Feynman” using a URI for the Dublin Core vocabulary term creator and a URI for the bibliographic work. The more triples we garner, the more the machines can glean about Feynman. It was a breakthrough for me to figure out that we need full triples in our project rather than simply exposing a name URI.
The second way the hands-on workshop was a success for me was that I had the opportunity to play with Google Refine. Google Refine is spreadsheets on steroids. It allows you to manipulate delimited data in more ways than Excel. Free Your Metadata has some videos which explain the process for using Refine with a RDF extension to prepare subject heading linked data (yes, I’m sneaking a Monday Morning Metadata Movie into this post). I was hoping to take my spreadsheet data of names and LCCN and get either VIAF or NAF URIs. That would get our faculty names linked data project to the point of having 1/3rd of a triple.
Unfortunately, the I could not get the RDF plug-in installed on my laptop. Some of my colleagues in the hands-on did manage to get it to work. We pulled in some very knowledgeable programmers to troubleshoot and their conclusion after about an hour of tinkering was that the plug-in was buggy. Back to the drawing board it seems.
There will be another Linked Data hands-on session at CODE4LIB 2012. I anticipate that it will be as useful as the DLF hands-on. I do plan on attending and I am keeping my fingers crossed that I can make some progress with our project. There is a great list of resources on the web page for the DLF session. There are other tools there besides Google Refine that I hope to play with before CODE4LIB. Plus there are links to other hands-on tutorials. Slowly but surely I’ll get my head wrapped around how to do Linked Data. I’m grateful the DLF forum furthered my understanding.
It’s a sad day in the library development world. Rurik Greenall, the kick-ass Linked Data developer at the Norwegian University of Science & Technology Library, has announced his intention to leave libraryland and work in industry where there’s more hope of doing great things with Linked Open Data. He writes that there is no real need for Linked Data in libraries due to the if-it-ain’t-broke-why-fix-it phenomena.
He’s absolutely correct. I’ve said it before. There’s little reason for most academic libraries to expose traditional bibliographic information as linked data. There really isn’t any reason to use Linked Data within the context of how libraries currently operate. Our systems allow us to do the job of purchasing resources, making them searchable for our customers, and circulating them to people. In harsh economic times, why spend time/energy/money to change if things are working?
He’s also incorrect. There are reasons for librarians to do Linked Data. I suspect Rurik knows this and his tongue is implanted in his cheek due to frustration with the glacial pace of change in the Library systems world. Yes, there’s no reason to change if things are working. But things won’t always work the way they do now. We’re like candle makers after electricity has been harnessed. People still use candles but not as their sole source of light. The candle makers that are still in business pursued other avenues. Other use cases for candles besides “source of light” became prominent. Think of aromatherapy (scented candles), religious worship (votive candles), or decoration. It will be the same for library catalogs. People will always use them, but not as their main source of bibliographic descriptions. The traditional catalog data will be used in other ways. In my opinion, its future job will be as a source of local holdings and shared collection management Linked Data.
It’s quite telling that when Rurik asked, “what are the objectives of linked data in libraries” prior to the LOD-LAM summit and heard the crickets chirping. The cataloging world has failed profoundly at understanding our raison d’être. I think we’ve tied ourselves too much to Panizzi’s & Lubetzky’s purpose of the catalog (explicating the differences between different expressions/manifestations of works) and lost sight of the purpose of providing a catalog in the first place — connecting people with information. Our work should be focused on assisting others in their information seeking & use rather than focused on managing local inventories. The FRBR user tasks (find, identify, select, obtain) don’t cover the full spectrum of information behavior in the 21st century. People want to analyze, synthesize, re-use, re-mix, highlight, compare, correlate, and so on and so forth. Linked Data is the enabling technology which will allow these new types of information behavior. The use case of libraries providing catalogs of descriptive bibliographic records for discrete objects is becoming increasingly marginal.
So I’ll propose an answer to Rurik’s question. The objective of doing Linked Data in libraries is to facilitate unforeseen modes of information use. How does this translate into new use cases for how libraries operate? Perhaps it means creating better systems for information seeking (we’d better hurry though. Google is kicking our ass at this…). Perhaps, as I believe, it means focusing more on helping our customers as producers of information rather than consumers. Putting legacy library bibliographic data into a Linked Data form is but one small first step in the process. Once it’s out there in Linked Data form, it’s more amenable to the analyzing, synthesizing, re-using, re-mixing, highlight, comparing and correlating because we can now sic the machines on it. Putting legacy bibliographic data into Linked Data form is how we’re going to learn how to do Linked Data. Rurik is right that Linked Data in libraries will not work if this is all that we do. We need to take additional steps and figure out how to do Linked Data in a way that makes the most sense for our customers.
Rurik worked in the trenches to bring Linked Data into the library world. I’ve often referred to his work as I struggle, mightily, to teach myself how-to expose our Linked Data and use the Linked Data exposed by others. The library world needs more people who can help librarians bridge the gap between how we currently do business and how we need to do business if we hope to keep our jobs. I begin to feel like we’re on the Titanic when these sailors jump ship. I will seek the life-boat and continue learning the skills I need to help my library’s customers with their information seeking & use.