Skip to Main Content

Caltech Library News

Introducing the InvenioRDM GitHub Archiver (IGA)

by Chris Daley on 2023-05-31T15:17:00-07:00 in Library News | 0 Comments

Caltech DATA GitHub Archiver pageThe InvenioRDM GitHub Archiver (IGA) is a new software tool created by the Caltech Library. InvenioRDM is the basis for many institutional repositories, such as CaltechDATA, that enable users to preserve software and data sets in a long-term archive. Though such repositories are critical resources, creating detailed records and uploading assets can be a tedious and error-prone process if done manually. This is where our new tool comes in. IGA creates metadata records and sends releases automatically from GitHub to an InvenioRDM-based repository server. The metadata contained in the record of a deposit is critical to making the record widely discoverable by other people.

InvenioRDM is a research data management (RDM) repository platform based on the Invenio Framework and Zenodo. Of particular interest to software developers is that a repository like CaltechDATA offers the means to preserve software projects in a long-term archive managed by their institution. 

Here are some of IGA’s other notable features:

  • Automatic metadata extraction from GitHub releases, repositories, and codemeta.json and CITATION.cff files
  • Thorough coverage of InvenioRDM record metadata using painstaking procedures
  • Recognition of identifiers that appear in CodeMeta and CFF files, including ORCIDRORDOIarXiv, and PMCID
  • Automatic lookup of publication data in DOI.orgPubMed, Google Books, & other sources if needed
  • Automatic lookup of organization names in ROR (assuming ROR id’s are provided)
  • Automatic lookup of human names in ORCID.org if needed (assuming ORCID id’s are provided)
  • Automatic splitting of human names into family and given names using ML-based methods if necessary
  • Support for InvenioRDM communities
  • Support for overriding the metadata record it creates, for complete control if you need it
  • Ability to use the GitHub API without a GitHub access token in many cases
  • Extensive use of logging so you can see what’s going on under the hood

Data and software archived in a repository need to be described thoroughly and richly cross-referenced in order to be widely discoverable by other people. As described in our detailed documentation, IGA by default constructs a metadata record using information it gathers from the software release, the GitHub repository, the GitHub API, and various other APIs as needed. 


 Add a Comment

0 Comments.

  Subscribe



Enter your e-mail address to receive notifications of new posts by e-mail.


  Archive



  Follow Us



  Facebook
  Twitter
  Instagram
  Return to Blog
This post is closed for further discussion.

title
Loading...