The Caltech Library is proud to be a member of The Carpentries. This initiative organizes hands-on workshops to teach researchers essential computing skills. Open-source platforms are taught wherever possible, including OpenRefine, Git/GitHub, Python, and R, among others. To date, we have five instructors trained to offer Software and Data Carpentry workshops. Here, you can find out more information about who we are and what we teach. We periodically offer workshops hosted at the Library - check our schedule to see if one is coming up or sign up for our mailing list to hear about workshops before everyone else!
The Caltech Library is also happy to partner with campus groups to design and host Carpentry workshops for their target audiences. Contact us if you would like more information on co-sponsoring a workshop for your group!
Recent Workshops:
Text as Data: An Introduction to Natural Language Processing
May 5, 19 & 24, 2023
Data Visualization 101, and Interactive Data Visualizations in Python
May 10-12, 2023
Geospatial Data with Python
April 26-27, 2023
The Unix Shell
April 24-25, 2023
Introduction to Machine Learning
February 13-15, 2023
Programming with Python, Part 2
February 6-8, 2023
Version Control with Git, Part 2
January 31 & February 2, 2023
The Unix Shell, Part 2
January 24 & 26, 2023
Managing data with Pandas
December 2, 2022
Databases and SQL
November 29 & December 1, 2022
Programming with Python
November 14-17, 2022
Version Control with Git
November 8 & 10, 2022
The Unix Shell
November 1 & 3, 2022
Lessons Offered:
The Unix Shell: Covers the basics of file systems and the shell, which are fundamental to using a range of other tools and computing resources.
Version Control with Git: Covers the basics of using the Git version control environment from the command line.
Programming with Python: Teaches basic programming concepts using Python.
Programming with R: Teaches basic programming concepts using R.
R for Reproducible Scientific Analysis: Teaches the fundamentals of using R to write modular code, and covers best practices for using R for data analysis.
Using Databases and SQL: Covers the basics of using a database to explore experimental data.
Lessons Offered:
Data Organization in Spreadsheets: Covers good data entry practices, formatting data tables in spreadsheets, avoiding common formatting mistakes, handling dates in spreadsheets, basic quality control and data manipulation, and exporting data.
Data Cleaning with OpenRefine: Teaches how to use OpenRefine to effectively clean and format data and automatically track any changes that you make.
Data Analysis and Visualization in Python: Covers basic Python syntax, the Jupyter notebook interface, importing CSV files, using the pandas package to work with data frames and calculating summary information from them, and a brief introduction to plotting.
Data Analysis and Visualization in R: Covers basic R syntax, the RStudio interface, importing CSV files, the structure of data frames and calculating summary statistics from them, factors, and a brief introduction to plotting.
Data Management with SQL: Covers what relational databases are, how to load data into them, and how to query databases to extract just the information that you need.
Stephen Davison | Tommy Keswick | Tom Morrell | Donna Wrublewski |
Using Databases and SQL (SWC) Data Management with SQL (DC) |
The Unix Shell (SWC) Version Control with Git (SWC) |
The Unix Shell (SWC) Version Control with Git (SWC) Programming with Python (SWC) Data Analysis and Visualization in Python (DC) Author Carpentry (Multiple Lessons) |
Programming with R (SWC) R for Reproducible Scientific Analysis (SWC) Data Analysis and Visualization in R (DC) |