Joanna Kang
Joanna is the Data Science and Marketing Coordinator at UCSF Library.

What can Git, GitHub, and Automated Testing Tools Offer Researchers, Librarians, and University Staff?

There’s been a common refrain lately at University tech conferences: “a lot of researchers didn’t realize they were going to have to become programmers.”  

If you work at a university or research organization, this might resonate with you. When researchers chose a major or enrolled in graduate school, they generally didn’t realize they were also committed to learning Python, R, SQL, Unix, databases, and visualization tools. In response to this growing need, the USCF Library has partnered with ICHS, and Software Carpentry to provide support, training, informal workshops, formal curriculum based courses, and office hours to better support researchers in programming and data analysis.   

Once you start writing software, you may also realize that software is rarely something you write once and forget. If you write scripts, manage data, or maintain software for a lab, you have probably noticed that software changes constantly. Bugs are fixed, new features are added, data comes in new formats and from new sources, collaborators from other institutions propose changes. In addition to becoming programmers, a lot of researchers didn’t realize they were going to have to become archivists, quality assurance engineers, or version and change control managers as well!

You may have already experienced one of these increasingly common scenarios for research labs:

A researcher wants to change the code used for a calculation, but isn’t sure how extensively it is used and what side effects the change will introduce into the code base. Is it safe to do this? 

A research lab obtains a better source data that should be used for all future analysis, but wants to make sure the old data remains available to ensure research results remain repeatable. Can researchers archive a particular data file along with a previous version of the code?

A lab publishes code through an open source license, and researchers from other organizations propose changes. How can the lab test and validate these contributions prior to integrating them into the code base?

Git and GitHub provide tools to archive, version, and collaborate on code, data files, and other documents. Combined with tools like Jupyter notebooks, GitHub also provides a way to publish lab notebooks that merge code, formatted text with markup, mathematical notation, graphs, charts, and other material on the web.  

octocat github logo
GitHub provides tools to archive, version, and collaborate on code, data files, and other documents.

The UCSF Library is currently looking for ways to assist researchers and other staff to better manage code and other digital artifacts relating to research. Starting 2018, the Library will host a quarterly “Intro To GitHub” workshop, with our next upcoming session on Feb 20. See the class description and register. 

We’re also working on a pilot project with NCBI to investigate ways to archive and cite code, establish automated integration and unit testing, establish and document consistent build processes, and collaborate with other labs and research institutions.  

If you’re interested in learning more, please contact the Data Science Initiative at the UCSF Library or consider signing up for a workshop, office hour, or programming and pizza event. To find out about upcoming classes, sign up for the DSI newsletter or follow us @ucsflibdatasci.