This announcement is authored by COVID Tracking Project Archive Lead, Alex Duryee
The UCSF Library Archives and Special Collections is pleased to announce that the COVID Tracking Project (CTP) records are available for research. The CTP is a crowdsourced digital archive that was managed by a group of journalists at The Atlantic and approximately 500 volunteers. This committed group gathered, cataloged, and published state-level COVID-19 data over the first fifteen months of the pandemic. “The COVID Tracking Project was a remarkable and influential initiative — part citizen science, part journalism, part crisis response. I’m thrilled that UCSF Archives has acquired, processed, and made available the digital records of this unique organization,” said Amanda French, a digital archivist and key leader of the CTP at The Atlantic.
In addition to the CTP’s data products, this collection includes its data creation and quality records, organizational records, correspondence, and code repositories. Over 2,100 academic articles have cited data from the collection and federal agencies like the Centers for Disease Control and Prevention.
Open records available
The finding aid on the Online Archive of California describes the entirety of the collection and includes all of the CTP records held by UCSF. Records range from data processing infrastructure and documentation, correspondence with state and territorial health departments, original COVID-19 data captures, and Slack discussions like #gratitude and #emoji-march-madness. A significant portion of the collection is restricted until 2102 to protect the privacy of CTP members. However, the open records are available digitally and on-site by appointment within the UCSF Library Archives and Special Collections reading room.
The final data products from the CTP are available on Dryad, in accordance with FAIR principles:
In addition to the final data sets, UCSF developed a tool for viewing the data as it changed over time. COVID-19 data was never static. Often reporting schedules were inconsistent around weekends and holidays, and data was either reported late or updated long after the initial release. Another factor was that states continuously changed their data definitions throughout the pandemic. UCSF’s Data Explorer lets researchers view CTP’s data as it was updated, thus providing a more profound view of the topline numbers. Data Explorer includes references to original data sources (generally screenshots of websites and data files) and daily Slack discussions for each reporting source (available on-site at UCSF).
Oral histories and open source tools
Along with the collection’s files and data, the CTP records include oral histories created by the CTP as it came to a close in 2021. These oral histories provide a human-centered perspective on the data, the organization, and the pandemic in the United States. With permission from the interviewees, the oral histories are available via Calisphere.
The UCSF Archives and Special Collections also developed several open-source tools to aid in acquisition, preservation, and access to the CTP materials. CTP used platforms like GitHub, Instagram, and Twitter for public and internal communication. These platforms do not always provide accessible tools for preserving data; thus, UCSF created tools to download posts and private messages and generate access versions in PDF. These tools are available on GitHub for use in and development of digital archives.
Inspiring future research and education
This collection was designed in adherence to UCSF Library’s Archives as Data initiative and the broader Collections as Data movement. UCSF Archives and Special Collections developed multiple platforms and pathways to approach the collection. This way researchers across disciplines can discover and use the records in their work.
Whether it is from an epidemiological, social science, or data science lens, CTP Lead Archivist Alexander Duryee acknowledges the powerful insights this collection affords, “We believe that this collection will provide key context for the story of the pandemic and that researchers across disciplines will find it illuminating.” By cross-linking between the archival collection, oral histories, and data sets, the collection encourages deep exploration of the “whats” and “hows” of the CTP and its data.
The collection serves as the foundation of the Data Journalism Course-in-a-Box (DJCB) project, which is building a data science curriculum around the CTP records to support journalism education. The collection includes a comprehensive view of the data, from its initial publication on agency web pages through quality control and publication. Investigative reporter Tyler Dukes is developing the DJCB with the help of the UCSF team. The curriculum uses CTP data to illustrate to journalists how to work with and analyze real-world public health data and how to communicate complex topics to a broad audience.
Project team members
- Tyler Dukes, data journalism consultant
- Alexander Duryee, Covid Tracking Project Archive lead
- Edith Escobedo, UCSF Library project archivist
- Polina Ilieva, UCSF Associate University Librarian for Collections and archivist
- Charlie Macquarie, former UCSF Library digital archivist
- Kevin Miller, former COVID Tracking Project Archive lead
In addition, the team would like to thank the many collaborators across the University of California system and advisory board members for their contributions to this project.
Funding for The COVID Tracking Project Archive was provided by the Alfred P. Sloan Foundation (Sloan grant G-2022-17133).