We are excited to announce the publication of the UCSF COVID-19 Response Web Archive. UCSF has historically been a “first responder” to a wide variety of public health emergencies. At the outset of the COVID-19 pandemic, UCSF archivists recognized that the evolving UCSF response to the situation would contain valuable information about this important, tragic, and devastating historical moment. Documenting that response as it grew and changed would be a powerful historical record. And we were able to act quickly, because so much of the record is on the web.
Archives and Special Collections has been archiving websites for years. Our oldest captures date back to 2007, which feels like another epoch in web-time (you can see all of our web-archives here: https://archive-it.org/organizations/986). To archive the web, we use specialized tools to take “captures” or “snapshots” of a certain web-page at a certain time. This includes regularly returning to the site and taking a new capture at regular intervals. By utilizing this technique, web-archives are a valuable way to watch any website evolve and change, and thus documents an event (like a rapidly-evolving response to a global pandemic) effectively.
We had to work faster and in greater volume than we are accustomed to document the UCSF response to COVID-19. As you likely remember, during the height of the early days of the pandemic both the UCSF and the nationwide response was changing daily based on information that was rapidly shifting. Archives usually captures web-pages every 3 months or every 6 months, but upon embarking on this collection we realized that we needed to begin capturing certain websites every day. Additionally, at any given time UCSF has as many as 1,000 different official websites (something with ucsf.edu at the base domain). It was difficult to know which sites contained COVID information and should be captured. To remedy this problem, archivists set up GoogleAlerts to notify us anytime something was published to a ucsf.edu domain which mentioned certain key words that we identified as likely COVID-related.
Capturing external data
We also wanted to document outside coverage of UCSF activities. This includes information appearing on news websites, blogs, and occasionally social media, though the latter is persistently difficult to capture. (In the spirit of timely collection, we highly recommend that Twitter account holders download their Twitter archives.) We were able to use GoogleAlerts in a similar way to alert us to external reporting sites. Even more importantly, we benefited from the assistance by the amazing Anirvan Chatterjee, Director of Data Strategy at the Clinical & Translational Science Institute. Anirvan reached out early in the pandemic with a list of sites he had collected that contained documentation of UCSF’s role in the pandemic response. His human-curated list was immensely helpful. The proliferation of digital information makes human curation and metadata creation increasingly difficult in archival repositories. Having someone like Anirvan to devote time to collecting this information significantly improved the collection.
An important feature of this collection is its accessibility to both human browsing and computers performing computational research. We plan to use these materials to expand our work in digital health humanities as well as collections. This data will aid our colleague Kathryn Stine in her role coordinating these programs.
Connect with Archives and Special Collections
Do you have questions about the COVID-19 web-archive collection? Would you like to to use it in a computational project? Just love the archive? Contact the UCSF Archives and Special Collections.