Skip to Main Content
Caltech Library logo

Library Instruction: Software / Data Carpentry

Descriptions and resources for workshops offered by the Caltech Library. To register for a class, click on the class name. All classes held online via Zoom.

Carpentry @ Caltech Library

The Caltech Library is proud to be a member of The Carpentries. This initiative organizes hands-on workshops to teach researchers essential computing skills. Open-source platforms are taught wherever possible, including OpenRefine, Git/GitHub, Python, and R, among others. To date, we have five instructors trained to offer Software and Data Carpentry workshops. Here, you can find out more information about who we are and what we teach. We periodically offer workshops hosted at the Library - check our schedule to see if one is coming up or sign up for our mailing list to hear about workshops before everyone else!

The Caltech Library is also happy to partner with campus groups to design and host Carpentry workshops for their target audiences. Contact us if you would like more information on co-sponsoring a workshop for your group!


Recent Workshops:

Text as Data: An Introduction to Natural Language Processing
May 5, 19 & 24, 2023

Data Visualization 101, and Interactive Data Visualizations in Python
May 10-12, 2023

Geospatial Data with Python
April 26-27, 2023

The Unix Shell
April 24-25, 2023

Introduction to Machine Learning
February 13-15, 2023

Programming with Python, Part 2
February 6-8, 2023

Version Control with Git, Part 2
January 31 & February 2, 2023

The Unix Shell, Part 2
January 24 & 26, 2023

Managing data with Pandas
December 2, 2022

Databases and SQL
November 29 & December 1, 2022

Programming with Python
November 14-17, 2022

Version Control with Git
November 8 & 10, 2022

The Unix Shell
November 1 & 3, 2022

Software Carpentry

Lessons Offered:

The Unix Shell: Covers the basics of file systems and the shell, which are fundamental to using a range of other tools and computing resources.

Version Control with Git: Covers the basics of using the Git version control environment from the command line.

Programming with Python: Teaches basic programming concepts using Python.

Programming with R: Teaches basic programming concepts using R.

R for Reproducible Scientific Analysis: Teaches the fundamentals of using R to write modular code, and covers best practices for using R for data analysis.

Using Databases and SQL: Covers the basics of using a database to explore experimental data.

Data Carpentry

Lessons Offered:

Data Organization in Spreadsheets: Covers good data entry practices, formatting data tables in spreadsheets, avoiding common formatting mistakes, handling dates in spreadsheets, basic quality control and data manipulation, and exporting data.

Data Cleaning with OpenRefine: Teaches how to use OpenRefine to effectively clean and format data and automatically track any changes that you make.

Data Analysis and Visualization in Python: Covers basic Python syntax, the Jupyter notebook interface, importing CSV files, using the pandas package to work with data frames and calculating summary information from them, and a brief introduction to plotting.

Data Analysis and Visualization in R: Covers basic R syntax, the RStudio interface, importing CSV files, the structure of data frames and calculating summary statistics from them, factors, and a brief introduction to plotting.

Data Management with SQL: Covers what relational databases are, how to load data into them, and how to query databases to extract just the information that you need.

Meet Our Instructors!

Stephen DavisonStephen Davison Tommy KeswickTommy Keswick Tom MorrellTom Morrell Donna WrublewskiDonna Wrublewski

Using Databases and SQL (SWC)

Data Management with SQL (DC)

The Unix Shell (SWC)

Version Control with Git (SWC)

The Unix Shell (SWC)

Version Control with Git (SWC)

Programming with Python (SWC)

Data Analysis and Visualization in Python (DC)

Author Carpentry (Multiple Lessons)

Programming with R (SWC)

R for Reproducible Scientific Analysis (SWC)

Data Analysis and Visualization in R (DC)