Web Archives Collecting Policy

The UCSF Archives and Special Collections derives its collecting mandate from UCOP policies BFB-RMP-1 and BFB-RMP-2, and serves as the repository for the archival records, including websites, generated by or about UCSF, including the Schools of Medicine, Nursing, Dentistry, Pharmacy, the Graduate Division, and the UCSF Medical Center.

It is the responsibility of the Archives to identify, collect, arrange, describe, make available, and preserve records of permanent administrative, legal, fiscal, and historical value. These records are preserved as an asset for the UCSF community and other researchers. The transition to digital information for most University business means that many such relevant records now exist as online resources.

The web archiving activities of the Archives document the following areas:

Primary functions of teaching and research
Development of health care education and health sciences research
Leadership in the community at large
Activities of the student body and alumni
The development of the physical plant and grounds

To fulfill these goals, the Archives collects websites of:

Administrative offices
Academic departments
Faculty, administrative, and student committees
Faculty and student clubs
University and student publications
Laboratories and research facilities

Additionally, outside of UCSF websites, the Archives collects more broadly in the areas of:

AIDS History
Anesthesiology
Biotechnology and biomedical research
Tobacco control and regulation (maintained in collaboration with the Industry Documents Library)
Global Health Sciences
Neuroscience
Computational Medicine

The Archives may make exceptions to the above criteria, considered on a case-by-case basis.

Web Archives program guidelines and responsibilities

The Archives uses the Internet Archive Archive-It service to capture websites. No active participation is required from UCSF content owners and creators, but several steps may be taken to help ensure that websites are preserved as completely as possible.

Responsibilities and best-practices for web-archiving at UCSF are outlined below:

The UCSF Archives and Special Collections will:

Identify, appraise, and select websites that reflect the mission and collecting interests of the Archives and Special Collections as outlined in the Collections Policy
Organize and manage archived websites to complement current holdings in the UCSF Library
Provide descriptions and contextual information for materials
Mediate access (via metadata, catalog records, and an access interface) to facilitate the search and retrieval of content
Respect the intellectual property rights of owners and ensure compliance with all applicable laws and policies:
- Distinguish ‘archived’ sites from ‘live’ content with a prominent banner and statement at the top of each preserved web page
- Suppress content from public view or refrain from website preservation at the request of content owners
- Not capture any content which requires a password to access or which may contain protected health information or other restricted data
Reach out to webmasters when website design or configurations pose issues for the accurate capture of content

The Internet Archive’s Archive-It service and team will:

Maintain the web crawler, a computer program (or robot) that browses websites and saves a copy of all the content and hypertext links it encounters. By default, Archive-It’s crawler will not degrade website performance
Store archived content in a digital preservation repository at one of the Internet Archives’ facilities

Content creators and owners will be able to:

Rely upon the UCSF Archives and Special Collections to identify, preserve, and provide access to multiple versions of select websites over time
Allow the Archive-It web crawler to preserve websites by including the following exception in the site’s robots.txt file:
- User-Agent: archive.org_bot Disallow:
Inform the Archives if a website is scheduled to go online, be decommissioned, or undergo significant changes
Request capture of a website, and specify the frequency with which the capture should occur (one time only, weekly, monthly, quarterly, or annually). Default capture for UCSF content is quarterly
View captured pages in our Archive-It collection

Please note: The Archives may not be able to preserve the exact form, functionality, and content of sites as they appear on the live web. The following types of content present significant issues for capture and/or display:

Dynamic scripts or applications such as JavaScript or Adobe Flash
Streaming media players with video or audio content
Password protected material (we do not collect any web-content which requires a password to access)
Forms or database-driven content that requires interaction with the site
Exclusions specified in robots.txt files

Request capture of a UCSF website

UCSF affiliates are encouraged to request capture of the websites they maintain using the Request for Website Archiving form.

Questions or comments?

Contact the UCSF Digital Archivist at digitalarchives@ucsf.edu.