It’s the little things

Posted by laura on April 1, 2011 under Metadata, Standards | 4 Comments to Read

I’ve mentioned that we want to get authority records for all current Caltech faculty into the National Authority File and by extension into the VIAF.  The 1st step is to ensure that we have a current and comprehensive list of all faculty working here.  I’m happy to learn that I can easily obtain the information in a manipulate-able  form.  I was expecting that I would need ask  somebody in academic records and plead our case.  Lists can be tightly guarded by powers-that-be.  I just figured out that you can convert HTML tables to Excel via Internet Explorer.  That’s probably old news to most of you.  I’ve done .xls to html conversion, I’ve just never had the need to go in the opposite direction.  Plus I don’t use Internet Explorer.

I was able to create a spreadsheet of the necessary data by doing a directory search limited to faculty and running the conversion.  Sweet! Now we can divvy up the work and get cracking.  Getting the info is a small thing.  But it’s these little victories which make my days brighter.  I played around with the delimited text to MARC translator in MarcEdit to auto-generate records from the spreadsheet.  It worked like a charm.  Unfortunately the name info in the spreadsheet is collated within a single cell.  Also it’s in first name surname order without any normalization of middle initials, middle names, or nicknames in parens.  A text-to-MARC transform can only work with the data it is given.  A bunch of records with 1oo fields in the wrong order isn’t so helpful.   I messed about with the text-to-columns tool in Excel in order to parse the name data more finely, to no avail.  It worked but would require much post-split intervention to ensure the data is correct.   Might as well do that work within Connexion.

In fact, I’m ok with creating the authority records from scratch since we’re training to be NACO contributors.  People need the practice.   In my experience, it’s easier to do original cataloging  vs. using derived records.   Editing requires a finer eye and original work can be helped along with constant data and/or macros.   Regardless, it was fun to play with the transform and teach myself something new.  And it’s very exciting to take a step towards meeting our goal of authority/identity information/identifiers for our constituents.

  • Bryan said,

    Though I have no plans to work on this kind of project any time soon, I do appreciate that you outlined how one might do it. Maybe I’ll experiment with it at home. Thanks.

  • Metadata and Authority Files | Celeripedean said,

    […] since I found out about Laura Smart’s blog, Managing Metadata, I’ve seen some really exciting and useful posts. Her latest one on adding Caltech faculty to […]

  • Jonathan Rochkind said,

    How does such an automated process avoid creating a duplicate authority record for a CalTech faculty member who already has an authority in NAF? (surely some of them do). Will your process avoid creating duplicate authority records for the same person if they change their name?

  • laura said,

    Hi Jonathan – lovely to “see” you here. This process doesn’t avoid dupes. It wasn’t intended to be completely automated. It was more of an experiment to see how much automation could be done and an exercise in improving my skills with the text-to-MARC translator in MarcEdit. We’re going to be doing our records individually from scratch since the folks in my unit are still in training for NACO independence. It’s also necessary to do them by hand due to the lack of normalization in the name column of our .xls data.

    We will be doing an initial run-through of the list to remove any names which are already in the NAF. This type of thing can be delegated to staff or students since it’s a simple look-up. Then we’ll move on to creating the records.

    This type of work is feasible for us because we’re so small. We’ve got 300 active faculty. It wouldn’t be as do-able for a larger institution – another reason for trying to figure out ways to automate the process as much as possible. I’d like to help others deal with faculty name identifiers

    It would be ideal if the NAF or VIAF had batch searching & batch loading mechanisms. I imagine that would make dupe detection a little easier.

    Re: your comment on a faculty member changing their name, it’s pretty rare that one would change the main heading in an authority record. The main heading is selected based on predominant usage, i.e. name most frequently used in publications (or author preference if known). Changing it means a lot of file maintenance. OCLC has to update the bibliographic records with the name change. Local ILS need new authority records and global updates of their bib records. Generally when there’s a minor name change the new name is added to the record as a cross-reference. If the change is major enough then a new record would be created with a cross-reference between records to indicate earlier/later usage.

Add A Comment