Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Publish: Research Data Support

Research Data Support

woman in the light of projected code

The library helps campus research labs and centers manage and publish their research data. We've also worked with Information Management Systems & Services (IMSS) to put together a centralized Research Data FAQ.

CaltechDATA and DOIs

Caltech Library offers a free managed data storage and sharing service at https://data.caltech.edu. Find out more information or read the CaltechDATA FAQ.

CaltechDATA offers standard data preservation and DOI (permanent identifier) services. We also offer services (at an additional cost) for preserving large volumes of data (> 500 GB). Contact us at data@library.caltech.edu to discuss the options.

Caltech Library also manages custom DOIs for groups on campus. Find out more information, see our DOI page.  

Data Management Plans

Funding agencies, such as the National Science Foundation or the National Institutes of Health have specific requirements and templates for Data Management Plans (DMP). Funder templates and further guidance are available at the DMPTool.

Caltech Library has created a number of Caltech-specific DMP resources that are available on this webpage, including standard language, a DMP checklist, and an example DMP.

Data Consultations and Training

Library staff can help you or your research group work through challenges associated with research data. We can provide:

  • Data management plan (DMP) assistance
  • Data management guidance
  • Capacity planning
  • Storage technology recommendations
  • Data processing suggestions
  • Interactive visualization hosting
  • Long-term archival and data sharing

Contact us at data@library.caltech.edu to schedule a consultation appointment.

Library staff offer workshops and training in the following data-related areas:

  • Data management
  • Data visualization
  • Software and Data Carpentry

Contact us at library@caltech.edu to schedule a session for your research group, class, or organization.

NIH Policy for Data Management and Sharing

The new NIH Policy for Data Management and Sharing (NOT-OD-21-013) goes into effect on 2023-01-25. The policy requires that:

  • All NIH grants submit a 2-page data management and sharing plan
  • Researchers "maximize the appropriate sharing of scientific data"

There are several specific details to note about the new NIH policy:

  • Data should be shared in a repository.
  • Sharing will happen sooner: "shared scientific data should be made accessible as soon as possible, and no later than the time of an associated publication, or the end of performance period, whichever comes first.”
  • Researchers can ask for money to pre-pay long-term storage costs and for data management activities. More information on allowable costs is available here: https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-015.html 
  • If a DMP needs to be revised, it should be updated and reviewed during regular progress reports.
  • The new policy takes a more nuanced approach with respect to the sharing of data on human subjects. The policy outlines specific considerations for this data:
    • "Researchers proposing to generate scientific data derived from human participants should outline in their Plans how privacy, rights, and confidentiality of human research participants will be protected"
    • "NIH strongly encourages researchers to plan for how data management and sharing will be addressed in the informed consent process"
    • "Researchers should consider whether access to scientific data derived from humans, even if de-identified and lacking explicit limitations on subsequent use, should be controlled."

NIH has a dedicated webpage, https://sharing.nih.gov/, containing information about the policy and a list of NIH-supported data repositories. If you have questions about the policy or where to share your research data, please contact library@caltech.edu.

Research Data FAQ

I’m going to be collecting research data. Where should I put it?

Caltech IMSS provides Box.com cloud storage, with 50 GB of storage for each community member and 1 TB per group. Additional storage is available for an extra cost. Box manages the storage, but you manage access to files. However, Box may not be fast or efficient enough for large amounts of data, and it has a max single file size of 15GB.  If your data management needs are too large for Box, you may want to purchase local storage hardware such as a Network Attached Storage device or storage array.  IMSS and the library can help you decide on what option is best for your needs (help.caltech.edu select IMSS/Data Storage & Backup or email data@caltech.edu).

I need to analyze research data or run simulations.

The new Resnick High Performance Computing Center (HPC) cluster is an excellent option. Your calculations will run on a state-of-the-art resource at Caltech with local support.  Your research group leader has to set up an account (hpc.caltech.edu/documentation/getting-started), and there is a charge depending on how much computing time you use.  Groups get up to 30 TB of free data storage, although this storage is not backed up, so groups must store primary data elsewhere.  National (off-campus) computing resources like XSEDE (https://www.xsede.org/) are also available by application and can provide additional computing resources at no charge.

I want to ensure that my data remains available for a long time (like a publication).

You can deposit your files in CaltechDATA (data.caltech.edu), the library-run repository.  CaltechDATA accepts files of any type and size, although you should email data@caltech.edu if you’re planning on uploading more than 500 GB of data.  Caltech library is responsible for maintaining access to the files, and all data records are assigned a Digital Object Identifier (DOI) to provide permanent linking and simplify citation.  You can make your files public immediately or after an embargo period.

I’m developing software and want to make sure it remains available for a long time.

The CaltechDATA repository (data.caltech.edu) can accept software and even has an integration with GitHub to automatically preserve software releases.  Contact us at data@caltech.edu with any questions on configuring the integration.

I want to share data with collaborators or reviewers.

To share research data files you can use the file sharing options in Box.com, which also allows you to set a custom password for the files.  Box.com is a complete cloud file service, so you can add collaborators that can access files with Box.com credentials.  Unlike services like Dropbox, collaborators can store files in a shared folder using your institutional Box storage allocation.

I’m collecting data on human subjects.

Talk to the Institutional Review Board (IRB) about all data collection and storage plans for your project (irb.caltech.edu/).  Box.com, SharePoint, and OneDrive are certified by IMSS for personal data covered by HIPPA or FERPA regulations.