We are pleased to announce that George Porter will move into the newly formed role of Repository Coordinator at Caltech Library. As the Engineering & Applied Science Librarian, George has worked on our repositories since they were created. However, the Repository Coordinator role formalizes and funds the full-time work required to support and maintain CaltechAUTHORS as a campus service. We took this opportunity to ask George some questions about his new role and Caltech repositories in general.
Congratulations on your new position as Repository Coordinator, George! Could you give us an introduction to what repositories are and do?
Thank you. I’m delighted to take on this new role. Repositories are collections of material. In the library world, the four primary types are: subject-based or institutional; textual or data. PubMed Central is an example of a subject textual repository. WormBase, is a subject data repository, hosted at Caltech, but not limited to Caltech discoveries. The Caltech Library has been collecting, highlighting, and disseminating the Institute’s scholarly output, primarily textual, through CaltechAUTHORS, since 2001. CaltechDATA has been the Institute’s data repository since 2014, providing a home for datasets for which there is no appropriate disciplinary option.
What created the need for this new position?
CaltechAUTHORS has been a victim of its own success. Over the course of more than 20 years, the Library has operated the repository with a small team of developers, programmers, and library staff contributing effort as priorities allowed. CaltechAUTHORS has grown to more than 100,000 records and been the source of more than 25 million global file downloads. As the largest trove of scholarly output from any single university, the time had come to provide formal, dedicated oversight for the operation. Fortunately, when the job was posted, I happened to have more than 20 years of relevant experience to offer.
What will a day in the life of the Repository Coordinator look like?
The next three to six months will see CaltechAUTHORS migrate from the EPrints platform, in continuous use since 2001, to InvenioRDM, the CERN-backed platform which underlies CaltechDATA. With such a large database, there are a great number of issues to consider and fields to map as we prepare to migrate. As with many services which have developed organically, the metadata standards and collection policies have evolved over time. With a full-time position, it will be possible to focus on establishing a baseline for the metadata, while looking for opportunities to automate ingest and update processes.
What will you be able to do with the repositories now that you can give them your full attention?
The initial impetus for CaltechAUTHORS was to provide free, legally compliant access to as much Caltech scholarship as possible. In 2009, the Faculty Board refocused the Library’s effort to include documenting all of the Institute’s scholarly output, even if legal restrictions prevented the Library from distributing publisher-formatted articles. Fully addressing the Institute’s research output, current and stretching back to Caltech’s beginnings, will require multiple strategies to identify all of the material.
What is your most interesting repository story?
My favorite story is still the celebration the Library hosted in 2016 to commemorate Jack Robert’s Internet sensation, Basic Principles of Organic Chemistry, 2nd ed. Shortly after Jack’s 98th birthday that June, his landmark organic chemistry textbook surpassed 500,000 file downloads from CaltechAUTHORS. The book has continued to be widely read from the Internet and has now been downloaded more than 1.25 million times. His co-author, the late Marjorie Caserio, gave a presentation on the organic chemistry textbook as an Internet phenomenon a few years later at an American Chemical Society meeting in San Francisco.
What are your predictions for the future of Caltech repositories?
This being Caltech, the future is bright and the horizons are virtually limitless. Caltech’s mission—"to expand human knowledge and benefit society through research integrated with education"—dovetails perfectly with an effort to collect that expansive research and to share it as widely as possible.