ed. note – There’s been a lot going on since my return from leave. Bereavement and other extenuating circumstances continue to preclude regular posting. I anticipate it’s going to be awhile longer before I return to it. Meanwhile, I have some drafts in my pending box. This blog has been bereft of posts so I’ve begun the slow process of revision and posting. I wrote the following a year ago. It’s a bit dated, due to the Harvard reference. I think the rest of it continues to apply.
The outcry regarding the Harvard Library’s restructuring brings to light the vulnerability of traditional technical services librarians.
I received a mandate to turn metadata services into public services when I began my current job. Lets leave aside the implied notion that metadata services isn’t a public service. We do our jobs so well that people tend to forget that without us the ILS wouldn’t function effectively and we couldn’t serve our customers. But I digress. Let’s re-frame and say we have a mandate to turn metadata services into *more visible* public services.
So what do we mean by that? Obviously, the next-gen metadata services are not the standard metadata services we’ve grown accustomed to in academic libraries (i.e descriptive cataloging in MARC for our ILS, in DC for our repositories, DACS for archives, authority work, holdings work, etc.). These traditional services are not going away, let’s get that clear. Original cataloging will remain a large part of the work of huge libraries with lots of original monographs. The rest of us will keep doing traditional acquisitions, copy cataloging, etc. We must continue providing these services but they will take a lesser and lesser role. You understand the trend towards automating as much as possible in cataloging if you’ve been awake for the past 15 years. Observe academic libraries obtaining more and more electronic resources with record sets we manipulate in batch. This is what the FBI calls a clue. As we automate the old functions, we make room for doing the new functions of a Metadata Services Group.
So what is a next gen metadata service? We can look to the likes of MIT and Cornell and Stanford for guidance. They’ve been at the forefront of revamping cataloging departments and applying metadata skills in new ways. We can also do environmental scanning of the higher education environment, figuring out what our customers are doing and what they’re likely to need in the next few years. Don’t rely on asking them what they want. Remember Ford’s adage that if he’d done what his customers wanted, he’d be manufacturing horse buggies. Sometimes people don’t know what they can use. When we poke around academe, we know that managing the products of scholarly communication will figure largely in our future. We can also do environmental scanning of the technological environment. From that, we know that linking the various products of scholarly communication is important, along with managing the various metrics which are emerging . We can figure out a next-gen metadata service if we don’t panic and apply some inductive reasoning to the situation. Brainstorming a bit, here’s what I’d consider a “metadata service”
Metadata services offered:
- Search engine optimization, general & academic
- Metadata mark-up (ex. linked data in ePub, enhancing the full text in your repositories with things like chemical or mathematical mark-up )
- Ontology development
- Persistent identifier management, especially personal and organizational name identifiers
- Citation metrics, alt-metrics, bibliometrics, webliometrics, scientometrics
- Schema development and storage
These services, with the exception of consulting, seem to be machine-centric back-end things. Yet, they revolve around humans. Context is critical to do the back-end work successfully. Who are you optimizing search results for? Who uses the vocabularies you develop and maintain? People need to be identified, whether personally or organizationally. Perhaps all of these services could be considered sub-services of consulting. One needs to do the market research, user needs assessment, and customer engagement prior to service development. And it’s critical to continue engaging customers to evaluate the effectiveness of services and refine accordingly.
One point of tension I forsee – consulting directly with customers steps on the turf of public service librarians, who, in my experience, are understandable threatened. Do public service folk garner more metadata/technical prowess or do metadata/technical people work on their outreach and engagement skills? As with most things, the answer probably lies somewhere in the middle. It’s going to depend on the staffing within the library organization. Not everybody is going to be willing or able to change. That’s ok. Those that can’t do it, however, should realize that the library work isn’t for them and move on to other things. A colleague has the following quotation on his office white board, which summarizes this perfectly:
“If you don’t like change, you’re going to like irrelevance even less”- General Eric Shinseki
Wow. The year is only a few days old and already there’s tons of activity in the library metadata world.
First, I’m thrilled to say that the CODE4LIB preconference I’ve been involved with is a go. Digging into metadata: context, code, and collaboration will be a forum for coders and catalogers to continue the work begun at the “Catalogers & Coders” CURATECamp, held at the DLF Fall Forum. As you may recall, one of the next steps which emerged from those discussions was to have future events dedicated to fostering cataloger-coder relationships. Registration for CODE4LIB is full, and there’s a substantial waiting list. There’s sure to be other events in the future, however, as CODE4LIB regionals continue to expand and interest groups within LIS organizations develop. Also, we’ll be making all of the CODE4LIB pre-con materials available
Speaking of making materials available, I’ve finally put my Linked Open Data in Libraries, Archives, and Museums presentation up on slideshare. Thanks to Emily Nimsakont for letting me borrow a few slides from one of her presentations. Someday I’ll actually create a slidecast of this. I think slides sans context have limited utility. There will be another opportunity to catch me presenting live. If you’re going to ALA mid-winter, I’ll be speaking on a panel regarding future trends in authority work for PCC/NACO. I’ll post more details about that closer to the date.
Speaking of catalogers coding, Shana McDanold, one of my co-conspirators on the CODE4LIB pre-con, has been doing a bang-up job promoting CodeAcademy’s Code Year within the cataloging community. There are no more excuses for any cataloger wishing to delve into coding. Code Year sends you an interactive lesson per week. You can work along with many other librarians via the twitter hashtags #catcode and #libcodeyear. There’s also a PBwiki for further collaboration. I’m betting that the #catcode community carries on once the year is done – there’s much for us to do with the continuing evolution of catalogs, new metadata workflows with repositories, etc. I’ve blogged before about the blurriness in defining role boundaries between metadata librarians and programmers. Knowing coding basics can only help us improve our relationships with programmers. And, it’s going to lead to better services. We’ll be better able to articulate what we’d like our programmers to do when we’re developing stuff.
Exciting times! I’m very stoked to see the response Shana has received. Over the years I’ve witnessed lots of catalogers who are refusing to adapt to the increasingly technical nature of our jobs (not at MPOW, fortunately.) It seems the tide is finally changing. I think the best thing we can do as a community is figure out projects to make use of our nascent coding skills. No, I don’t have any ideas yet. I’ll keep you posted on that.
I’m now home from the DLF 2011 Forum. This was my 2nd year and it has become my favorite conference. I met people I wanted to meet both times I have attended. I liked being able to express my impressed-ness and talk shop. This year, I made connections from which some collaborations may emerge. Exciting!
I’m back with some very useful information, stuff I can apply to my day-to-day. The Linked Data hands-on session was awesome and merits its own blog post. Best thing: it helped me make a bit of headway with the CIT faculty names linked data work.
CURATEcamp also gets its own blog post. The conversation between catalogers and coders made a great leap. There are concrete next steps. See details at the CURATEcamp wiki. See also the pretty pretty picture of the distribution between catalogers and coders attending. Catalogers represent!
I laughed so much. This is truly the conference’o'mirth. Dan Chudnov proposed a weekly call-in show where a cataloger and a coder take questions from the field. The names proposed in the twitter back-stream made it difficult to hold back laughter for fear of disturbing the front-stream speaker. Chuckles aside, this seems to have strong possibility of actually happening. I’m getting over my instinct that back-chat is rude at conferences. It seems to have a group bonding effect. I can see the value of nurturing professional relationships through shared discussion and commentary. See part above about concrete next steps from CURATEcamp.
I got caught up on Data Management Plans and the eXtensible Catalog project. I asked for & received advice about migrating our archival information system. I asked people about their use of descriptive metadata and best practices for image management. I have new software to evaluate and test. I also got to see a preview of an RDA training workshop which focuses on as data elements (not RDA as records!!!).
My brain hurts because it’s so full. I’ll need to collect my thoughts. I absorbed so much that it wouldn’t be fair to blog one big brain dump. I’ll be able to synthesize it better if I break it down in chunks. Stay tuned.
I had the pleasure of attending an advanced MarcEdit workshop this past Friday taught by none other than Mr. MarcEdit himself, Terry Reese. I learned quite a few tips and tricks. Most important for me was learning how the regular expression engine functions and the extensions Terry included (not too many, yay for sticking to the .NET framework!). The portion on programming MarcEdit from the command line was a bit beyond my ken but it was cool to see it and file under must-learn-someday.
Terry has a quite a few YouTube videos demonstrating how to make the most of the program. They have been available for a couple of years but they’re worth reminding folks about. And hey, it’s Monday, it’s a movie, and it’s metadata related.
I’ve mentioned that we want to get authority records for all current Caltech faculty into the National Authority File and by extension into the VIAF. The 1st step is to ensure that we have a current and comprehensive list of all faculty working here. I’m happy to learn that I can easily obtain the information in a manipulate-able form. I was expecting that I would need ask somebody in academic records and plead our case. Lists can be tightly guarded by powers-that-be. I just figured out that you can convert HTML tables to Excel via Internet Explorer. That’s probably old news to most of you. I’ve done .xls to html conversion, I’ve just never had the need to go in the opposite direction. Plus I don’t use Internet Explorer.
I was able to create a spreadsheet of the necessary data by doing a directory search limited to faculty and running the conversion. Sweet! Now we can divvy up the work and get cracking. Getting the info is a small thing. But it’s these little victories which make my days brighter. I played around with the delimited text to MARC translator in MarcEdit to auto-generate records from the spreadsheet. It worked like a charm. Unfortunately the name info in the spreadsheet is collated within a single cell. Also it’s in first name surname order without any normalization of middle initials, middle names, or nicknames in parens. A text-to-MARC transform can only work with the data it is given. A bunch of records with 1oo fields in the wrong order isn’t so helpful. I messed about with the text-to-columns tool in Excel in order to parse the name data more finely, to no avail. It worked but would require much post-split intervention to ensure the data is correct. Might as well do that work within Connexion.
In fact, I’m ok with creating the authority records from scratch since we’re training to be NACO contributors. People need the practice. In my experience, it’s easier to do original cataloging vs. using derived records. Editing requires a finer eye and original work can be helped along with constant data and/or macros. Regardless, it was fun to play with the transform and teach myself something new. And it’s very exciting to take a step towards meeting our goal of authority/identity information/identifiers for our constituents.
When I was a newbie manager I experimented with more regular staff meetings for the people in the Metadata Services Group. I wanted to incorporate shared learning and group discussion into our meetings to make training more fun and relevant. So I added metadata videos to our Monday morning agenda. I would bring homemade vegan muffins to encourage attendance and participation since we met early and it was Monday after all (pix available via Flickr!). We called it the 4M: Monday morning metadata movies & muffins. You can pronounce that Mmmm.
We eventually abandoned that experiment. Folks liked the videos, but wanted to watch them on their own time. Since then, my periodic sharing of links for metadata-related videos with the folks on my team has dwindled. I was reminded of this practice when a dear friend recently asked me for the links to the videos. I was also reminded of this when Mod Librarian started posting a Metadata Monday series on her blog. Great minds and all that. I’ve finally managed to post the link to the YouTube play list for the late-lamented (at least by me) experiment. Drum roll please…for your viewing pleasure:
The 4M: Monday Morning Metadata Movies play list.
Caveat: the movies we watched were not always strictly about metadata, but they were on topics relevant to metadata management within academic libraries. They were intended for an audience of paraprofessionals & professionals. And sometimes they were more fun than educational.
Some past 4M videos which weren’t on the YouTube play list:
I’m inspired now to resume my quest for videos relevant to metadata workers in academic libraries. Perhaps I’ll even post them each Monday. Or at least on some Mondays. And I don’t promise to bake vegan muffins on Sunday nights.
I spent a fun day at the regional OCLC Good Practices, Great Outcomes event yesterday, where I was an invited speaker. It’s always a treat to hear Roy Tennant give a keynote. I was very impressed with the efficiencies Helen Heinrich implemented at CSU Northridge and the big dent Sharon Benamou made in the cataloging back log at UCLA. Holly Tomren did a fabulous job summing up the major themes which emerged. Video and slides from the event will be made available on OCLC web site soon. I promise to share the links. Meanwhile, I’ve put my slides up on slideshare.
It was great to give a talk again. I haven’t presented professionally in several years. I used to do it frequently but fell out of the habit when I switched career streams from public to technical services. Partially it was due to lack of time. I was busy learning the intricacies of MARC and volunteering my time on CC:DA during the development of RDA. Partially it was due to major illness. I spent a good chunk of 2009 on medical leave. And partially it was due to self-doubt. As a new metadata maven I wanted to have something useful to discuss before I began speaking about my work.
The best part of doing the talk was figuring out those things we’re doing at Caltech which may be useful for other tech services librarians. Reflecting back on my four years here, I realize we’ve accomplished a great deal.
- We’re adding more bibliographic records to our ILS despite a reduction in staff — on order of 10′s of thousands more. That’s the beauty of batch loading and purchasing record sets.
- We’re more efficient at our batch loading because we’ve tapped into regular expressions (shout out to Terry Reese. MarcEdit has been the major player in making us more efficient.
- We’ve learned how to apply business process analysis techniques to review our work flows and improve them, freeing up time for for training and developing next generation metadata services.
I have to give credit where credit is due. The Metadata Services Group team has really stepped up to the plate and wholeheartedly embraced the changes we’ve made. I’m so proud of them. It was easier for me to stand up and talk to a hundred or so people because I could share their success.
I’m stoked. I’ve been accepted to the International Linked Open Data in Libraries, Archives and Museums Summit. From the about page, the summit: will convene leaders in their respective areas of expertise from the humanities and sciences to catalyze practical, actionable approaches to publishing Linked Open Data, specifically:
- Identify the tools and techniques for publishing and working with Linked Open Data
- Draft precedents and policy for licensing and copyright considerations regarding the publishing of library, archive, and museum metadata
- Publish definitions and promote use cases that will give LAM staff the tools they need to advocate for Linked Open Data in their institutions
It’s exciting because of its potential to spark real progress for library linked data. I’m keen to be involved with projects where I can get my hands dirty. I’m pretty much done with librarian conferences like ALA. IMHO, ALA is an echo chamber of how-we-done-it-good presentations and yet-another-survey research. I went to an ERM presentation at the mid-winter meeting and heard a speaker discuss work flows that I’ve seen implemented in libraries for the past 13 years. Seriously. ALA is good for networking with fellow librarians to be sure but it isn’t the place to get bleeding edge information. I’m ready to give my time and effort to breaking new ground. I’m very fortunate that my boss is incredibly supportive of my LOD-LAM participation.
We want to do a linked data project with author identifiers for our faculty. We’re a small institution. We’ve got roughly 300 current faculty members which is a small enough number for us to create a complete set of records within a reasonable amount of time. Our goal is to contribute our metadata to the commons and to share our experience as a use case. I’m quite honored to be invited. I’ve been following the work of some members of the organizing committee for years and I’m very much looking forward to finally meeting them.
There are plenty of communities that manage metadata besides libraries. I frequently see job postings for data curators here at MPOW. I’m considering starting a metadata or data curation interest group on campus. I think librarians need to be proactive about the types of metadata services we can provide to our customers. Some of our peer libraries do a great job making metadata services a public service. See how MIT, Cornell, University of Indiana, and University of Wisconsin promote their metadata expertise.
At this point in time, I think our library can manage taking on a consulting type of role. Most of the people managing metadata on campus are specialists with advanced degrees in the discipline. There’s a reason for that. Their “collections” require subject expertise in order to properly create descriptive metadata. The experts don’t necessarily have training in creating metadata or doing digital preservation, however. And they probably run into the typical issues in managing metadata that libraries do. At the very least it would be useful to network with people that share common interests. It can only help us figure out how the library fits in with the emerging paradigms of scholarly communication.
I’m looking at incentives for making our serials holdings MARC standard compliant. MARC Holdings Format Data, pronounced “muffed” I’m told, isn’t supported very well within our ILS. MFHD is held within check-in records. It makes sense to a degree. One needs coverage ranges when checking in journals. The data is buried, however, in a place where most people using the ILS will not see it. Customers or staff. We would love to get it current, correct, and usable.
The biggest reason for standardizing is to make interlibrary loan work smoother. We get requests for “titles-not-owned” when OCLC indicates we own a journal but we don’t have a specific issue. This brings down our fulfillment rate. That makes us naughty players in the shared resources game. But what are the consequences of that? I’m not quite sure at the moment. Patrons beyond Caltech are important to us, absolutely. Yet they fall lower in our priority queue than Caltech faculty, staff, and students. When resources are limited we focus on projects with the biggest payoffs for our primary user group.
There are other good reasons for standardizing. Machines manipulate standardized data better. It’s a metadata truism. Let’s ignore the real-world issues with interoperability that have been demonstrated over the years. Those are really a result of human factors. We all know that standardized data is not truly standardized. See Naomi Dushay and Diane Hillmann’s excellent identification problems encountered in sharing Dublin Core records. But let’s live in an ideal world for a minute and say that we did get our data nice and clean and in a standardized format. All of a sudden we would have the means to re-use our data outside of our ILS. Theoretically at least. Much depends on the export capacity of our ILS.
It would be lovely if we could better automate maintenance of coverage ranges within our OpenURL resolver, for example. I’m sure there are more rationales for holdings standardization that I haven’t thought about. I’ve begun reviewing the literature. We can’t make a decision to do a large conversion project based on all of these feel-good reasons, however. The business case relies upon multiple factors: the state of our current data, the capacities of our ILS, the interoperability of our ILS and OCLC, and our staffing and budgetary resources. All of these need thorough analysis. So we’re holding on holdings at present while we gather information and ask hard questions. Ultimately it comes down to answering the question, will the payoff be worth the investment? Stay tuned.