Fourth Workshop on Scientific Archives Program of Events

Day one: June 5, 2024

8:30 – 9:30 Breakfast

9:30 – 9:50 Welcome and opening remarks with Dr. Catherine R. Lucey, MD, MACP, Executive Vice Chancellor and Provost at the University of California, San Francisco (UCSF)

Note that some presentations will be recorded and shared after the workshop.

Session one: Open Archives

9:50 – 11:40

Copyright inhibiting access: a novel collaboration with scientists to develop a new pathway to open archives at Cold Spring Harbor Laboratory Archives
Ludmila Pollock and Stephanie Satalino, Cold Spring Harbor Laboratory Archives, United States
In the scientific world, scientific meetings are hugely important as they serve as a major facilitator of communication and exchange of ideas among scientists. College and university research departments, as well as scientific institutions hold thousands of international scientific meetings yearly. Abstracts and/or videos from these meetings normally become collections in the hosting institution’s archives. Cold Spring Harbor Laboratory (CSHL) is recognized as one of the world’s premier hubs of activity in biology and genetics. CSHL hosts the esteemed annual Symposia in Quantitative Biology series which began in 1933. In addition to the Symposia, we have an additional 60 annual meetings on different topics that bring more than 10,000 scientists from around the world to the CSHL campus. In 1953, James D. Watson presented the structure of DNA for the first time publicly at the Cold Spring Harbor Symposium. The idea for the Innocence Project, co-founded by Barry Scheck and Peter Neufeld, came from CSHL’s 1988 Banbury meeting. The Human Genome Project roots date back to a 1989 CSHL meeting. Many successful projects were first presented at CSHL meetings and have since garnered international recognition: Lasker Awards, Breakthrough Prizes in Science, and Nobel Prizes. CSHL meetings have been recorded and stored in varied formats in the CSHL Archives for the past thirty years, as well as a print collection of annual meeting abstract books. These video and print collections document the development of molecular biology and present the enormous impact science has had on prolonging human life. Both collections are underutilized due to copyright restrictions. The collections would be of interest to students and educators in addition to scholars if open access was implemented. For now, we are trying to facilitate broader access; a prominent CSHL researcher proposed the creation of a small committee consisting of two laboratory executives, a historian, archivist, and a few researchers. This committee reviews abstracts and recorded talks and then makes selections based upon certain criteria of what can be made publicly available. With continued success, we plan to expand our methods. We are taking a risk-based approach to copyright, but will also be working to mitigate future risk by having presenters sign copyright release forms. We intend to share our newly developed project with the scientific archives community and look forward to comments and suggestions.

‘Armchair Archivists’: Enhancing Crowdsourced Historical Manuscript Transcription with Handwritten Text Recognition (HTR) Tools
John R. Schaefer, University of Cambridge, United Kingdom
The growing use of Handwritten Text Recognition (HTR) tools across the cultural heritage sector allows archivists and researchers to conveniently translate digital images of handwritten documents into machine-readable text. In addition to making archival content more accessible, this process exposes manuscript corpora to a variety of digital research methods, from topic modelling to social network analysis. This paper proposes an open-source framework for HTR-assisted manuscript transcription in the context of academic and cultural heritage sector projects that rely on limited volunteer resources. How can archives best utilize these platforms to create public HTR models? What does integrating HTR tools mean for the unpaid volunteers who form the backbone of many transcription projects in a challenging funding climate? How might the use of HTR in these projects be situated within the broader embrace of openness and linked data in the social sciences and humanities? Focusing on the popular Transkribus and eScriptorium Virtual Research Environments (VREs), this project examined the practical and theoretical implications of a novel HTR-based crowdsourcing workflow integrating both platforms. The workflow was developed for a six-month volunteer initiative in collaboration with the Joseph Hooker Correspondence Project at Kew Gardens and the University of Cambridge, resulting in a public open-source HTR model for Joseph Dalton Hooker’s (1817–1911) handwriting. These findings demonstrate that integrating HTR tools into similar crowdsourcing workflows is a cost-effective strategy for smaller transcription projects to generate reusable HTR models within a limited timeframe. While a potentially problematic replacement for human expertise, open-source HTR tools help enhance the accessibility and relevance of archival data for a global audience. When carefully implemented in the context of volunteer crowdsourcing, HTR tools can facilitate deeper academic and public engagement with scientific correspondence and natural history collections — shedding new light, for example, on their extractive colonial histories. This paper also addresses fundamental issues of transparency and openness in archival transcription by showcasing the benefits of open-source software platforms and data management practices, alongside the expansion of Linked Open Data (LOD) frameworks in the transcription and annotation of correspondence, to enhance interoperability across archival collections. Finally, I evaluate the practical implementation of HTR tools using survey and interview data gathered from a diverse pool of volunteer transcribers, voicing complex sentiments toward HTR technologies and automation in the archives.

Establishing a new paradigm in collaborations between archives and biological collections: creating the Carlquist Extended Specimen Network
Ana Niño, Jason H. Best, Krishna Shenoy, Sam Ekberg, of Botanical Research Institute of Texas, United States
Mare Nazaire and Sara Dave of California Botanic Garden, United States
New initiatives in the digitization of scientific archives and biological collections have given rise to the Extended Specimen Network (ESN)–a conceptual framework that places biological specimens at its center and connects them to disparate types of archival materials, whether field photographs, field notes, maps, and more, to build a vast network of digital resources. The Herbarium at California Botanic Garden and the Library at the Botanical Research Institute of Texas have partnered in an effort to preserve and provide access to the collections of Sherwin J. Carlquist (1930 – 2021), an American botanist and photographer well known for his contributions to the fields of island biogeography, evolutionary ecology, and wood anatomy. The goal of the Carlquist ESN is to link Carlquist’s biological collections housed at California Botanic Garden and his archival collection housed at the Botanical Research Institute of Texas. The Carlquist collection comprises more than 190,000 objects, including slides, photographs, negatives, field notebooks, microscopic slides, herbarium vouchers, wood specimens, and specimen material preserved in spirits. This project aims to curate the comprehensive Carlquist collection for long-term preservation; digitize and mobilize the collection’s data; and link transcribed collection information, images, and metadata across all biological collections and archival objects–to publish and make publicly available through four open-access platforms. The Carlquist ESN has the potential to provide “additional physical preparations and digital resources,” (Lendemer et al, 2020) facilitating immediate application to a suite of research areas, including systematics, wood anatomy, ecology, biogeography, island biology, phenology, plant conservation, history of science, plant humanities, digital humanities, and archival theory and practice. The Sherwin Carlquist Collection ESN project represents a unique opportunity for the collaboration of two different but interconnected disciplines to shepherd a new paradigm in the way in which the biological and archival communities envision, implement, and utilize ESNs within and between their collections. Outreach opportunities, challenges and lessons learned during the forging of this ESN will also be highlighted in this talk.

11:45 – 12:45 Lunch

Session two: Collections as Data

12:45 – 14:05

Presentation withdrawn
~~The Perrey Archive: a Tool for Historical Seismology~~
~~Corinna Guerra, Ca’ Foscari University of Venice, Italy~~
The Perrey Archive, a unique historical collection of documents on natural disasters of the past, is preserved in the Società Napoletana di Storia Patria library. This collection is the result of the tireless research of French scientist Alexis Perrey (1807-1882), who is regarded as one of the founders of seismology. I will reconstruct how it was collected and the events that brought it to Italy.

Linking Analog Archival Data Across Anthropology’s Four Fields
Amanda H. Sorensen, Diana E. Marsh, Katrina Fenlon, Nikki Wise, University of Maryland, United States
Celia Emmelhainz, Smithsonian National Anthropological Archives, United States
As scientific evidence, anthropological archives are unique and irreplaceable. For the past 150 years, anthropologists worldwide have documented complex lifeways, languages, embodied cultures, and communities in flux, in far greater detail than ever available in a published form (Silverman 1995: 1). While contemporary research often generates born-digital archives (Sanjek & Tratner 2016), most of anthropology’s evidential record survives as textual, image, and audio evidence located in brick-and-mortar institutions, or in scattered offices and labs (Ruwell 1995; Zeitlyn 2022). Yet despite efforts to build data repositories and born-digital archives (Cliggett 2013), the data embedded in anthropology’s physical archives are still largely disconnected from the networked systems of discovery and access that support scientific research and reuse. This session discusses how archives can more effectively make primary sources reusable as data to support active research in scientific and social scientific disciplines. Our study aims to highlight in part how we might balance Indigenous data sovereignty and cultural sensitivity with networked discovery and access (“Building a sustainable future for anthropology’s archives: Researching primary source data lifecycles, infrastructures, and reuse,” PI Diana Marsh, NSF CA-SR award #2314762). This study builds on two strands of prior work: first, “Recovering and Reusing Archival Data for Science” (USDA, PI Katrina Fenlon), which explores scientific data recovery and reuse of historical data from archives and legacy research materials; and second, a collaboration between the Council on the Preservation of Anthropological Records (CoPAR) and the Smithsonian’s National Anthropological Archives (NAA) aimed at improving the visibility of scientific archival collections using Wikidata. We look for ways to transform data platforms to integrate Indigenous stewardship and community-based knowledge, or what we term reparative linked data. Building on this framework, we discuss what we’ve learned from our current NSF project, which asks how anthropology’s analog sources—such as fieldnotes, photographs, and audiovisual materials—can become more findable, reusable data sources for contemporary research questions. We discuss our progress in identifying best practices for scientific archival information infrastructure, evaluating test collections in cross-disciplinary open-access platforms, and developing training modules for anthropologists and data curators. Through dialog, we hope to facilitate discussion about 1) how historical archives are currently being used by scientists and social scientists, and 2) connecting these projects with others in archival infrastructure or practice for data reuse, especially in the global social sciences.

Use and Usefulness: Documenting Archival Data
Anjali Ramachandran and S. Prashant Kumar, Archives at NCBS, National Centre for Biological Sciences-Tata Institute of Fundamental Research, India
Using two collections held by the Archives at NCBS, the first of an early space scientist T.S.G. Sastry, and the second a mathematician, B.S. Madhava Rao, we report on a collaboration between a historian of mathematics and an archivist of science. The Archives at NCBS is the only public collecting centre for the history of science in contemporary India. Archiving the history of science in India presents unique challenges. The archives of scientific institutions in the post-independence period are rare, often poorly conserved, and largely uncatalogued, meaning that procedures for appraisal, arrangement, and description have to be worked out from scratch. Sastry and Madhava Rao themselves, while adjacent to major figures and scientific personalities, were not themselves well-known. Without both historical and scientific context-setting, these collections offer only a blinkered view into science in practice. Setting context in the absence of the large-scale preservation of scientific papers in India—as is the norm in the global north—means what are usually simple archival tasks often require specialist or subject matter expertise. For example, since the Madhava Rao collection was posthumously donated, the papers were given no original order by their author. Collaboration made it possible to identify cases where relevant context had been preserved. Where it had not, subject matter knowledge was required to follow mathematical proofs and calculations and reestablish relations between disaggregated materials. In these two collections, there were several appraisal and arrangement choices made in order to preserve context which shows relationships between concurrent strands of scientific thinking. We argue for a difference between the preservation of data still of use to present-day science, and the preservation of apparently orphaned or damaged data. The preservation of microfilm data showing variations in the earth’s magnetic field might be of use to scientists, but damaged slides which may be of no use to science are still of historical value for what they may reveal about larger scientific networks. Such an appraisal standard requires subject matter expertise, knowledge of relevant historiography, and an archivist’s sense of the limitations and purview of the archive itself. Collections and archives may well be data, but there is an important and subtle difference between the appraisal and preservation of scientific data and historical data, which requires equitable collaboration between archivists, historians, and subject-matter experts.

Session three: Scientific Museums and Archives

14:05 – 15:05

Scientific Archives and Museum Collections: an Integrated Vision
Patrícia Costa, Instituto Superior de Engenharia do Porto, P. Porto, Portugal
Milena Carvalho and Susana Martins, Instituto Superior de Contabilidade e Administração do Porto, Instituto Politécnico do Porto, CITCEM, Portugal
The Museum of the Instituto Superior de Engenharia do Porto, Polytechnic of Porto, houses an important collection of scientific instruments of great scientific and pedagogical quality that clearly reflects a great deal of scientific knowledge about the circulation of new ideas and techniques, which allowed a country like Portugal to develop industrial education and experiential training, with reference to the most technically developed countries, as France, England and Germany. Due to its characteristics and underlying history, this museum collection is of considerable historical and scientific interest for the study of the different teaching methods practiced in industrial education in the city of Porto, even when compared to today’s standards. However, it cannot be dissociated from an important set of documents that proves all the school’s administrative activity, as well as the record of scientific activity carried out in the different subjects at the school between 1852 and 1974 – the Historical Archive. In our opinion, all these elements should be seen in an integrated way, thus forming the basis of the information system (IS) with a focus on the digital information potential, creating a digital service that contributes to efficient management of all the heritage information produced in a museum context. All information resources are important insofar as they all, with their commonalities or specificities, generate knowledge about collections and heritage. Although the fundamental part of a museum’s IS is centered on its collection, it must also include the processing and availability of information associated with the historical archive and documentation center and other data on the institution that is considered relevant, allowing for a perfect and transparent articulation between the three aspects. In this way, we consider the museological object to be a document, since it is part of a chain of contexts, links and connections, and is an informational envelope. The work presented here is an explanatory study because it aims to provide in-depth knowledge of the subject in question, based on a literature review and documentary research. The conclusion is that an integrated approach allows for greater and more in-depth knowledge of the activities of producing organizations and their information objects, regardless of their nature, analogical and digital. This text is financed by National Funds through FCT – Foundation for Science and Technology, within the scope of the project UIDB/04059/2020.

Collaboration between archivists and scientists at the American Museum of Natural History
Maya Naunton and Kiana Clark, American Museum of Natural History, United States
The archive at the Vertebrate Paleontology Department of the American Museum of Natural History has grown organically over the 130-year existence of the department. While there have been efforts to organize and process the papers in the past, the majority of the archive has not been processed. An IMLS grant received by the department in 2020 supports the effort to process the papers in the archive and make them accessible to researchers. Our presentation seeks to address multiple themes outlined in the Workshop announcement. 1. Close collaboration required between archivists and the members of the scientific staff. This collaboration helps the archivists understand the material and to organize it in a way that will benefit the future researchers. The collaboration in turn, helps the scientific staff to understand what the archive contains and how it can be helpful to their work. 2. Accessing the barriers for creating of open archives. The grant stipulates that the results of the work should be available to the public and the finding aids are posted to the publicly accessible archive catalog. However, there are some issues with making all of the material open to the public. Besides the usual consideration of privacy, there are some instances where the department has concerns with making fossil find spots widely known. This requires an extensive dialog between the archivists and the science staff to create an acceptable policy on restrictions. 3. The information in the archives is not currently correlated with the collection database. We are working on creating a situation where it will be possible to connect specific specimens to material in the archive and thus open up a new way to analyze the collection, mining it for new insights into the history of collecting, paleoecology, and possible relationships between specimens that are not obvious currently.

15:05 – 15:30 Coffee/tea break

Session four: Medical Archives: Ethics of Preservation and Access

15:30 – 16:30

Beyond the Walls and Changing the Space of Archive Through the Lens of The Adler Museum of Medicine in South Africa
Samuel Umoh Uwem, University of Hradec Kralove, Czech Republic
Adetola Elizabth Umoh, University of Autonomous Barcelona, Spain
Governments seeking to make their records widely accessible to their citizens through open data and open knowledge are prioritizing digitization projects. In South Africa digitization efforts are directed, in part, at the large and expanding volumes of medical records that are frequently kept in museums, archives, and other institutions. Notably, memory institutions such as South Africa’s Adler Museum of Medicine have been engaged in preserving the history of the health sciences throughout Southern Africa, with particular reference to Gauteng. The Adler Museum of Medicine, founded in 1962, is situated in the South African Institute for Medical Research, Johannesburg, and contains a Library section and a Collections Center. The latter houses collections on the history of medicine, dentistry, and pharmacy; while the former houses an archive of the biographical information of several medical and allied health professions, which is available to students, researchers and members of the public. The Museum augments the educational activities of the University, especially the Health Sciences, with its collections, research, teaching, exhibitions, and publications. Drawing from the Adler Museum of Medicine in South Africa, this paper briefly provides an overview of the Center’s, functions and of the legislation that governs the archives housed at the museum. The paper examines how archival practice through digitization strengthens or undermines provenance and original order. It analyses the migration to digitization of objects; the procedures, conditions, and reasons that influence the choice to include or exclude medical archives; and how they incorporate these insights into the design, functionality, and affordance of their digital archives. It highlights how medical museums, through the activities of the separate units of Gallery, Library, Archive and Museum approach and foster open access to medical archives. Drawing from oral history, archival sources, documentaries and interviews in South Africa, this paper argues that since COVID-19, the medical museum archives is simultaneously trying to digitize its holdings and maintain the physical originals. However, there are concerns amongst Archives and Records Management (ARM) practitioners about a coming digital dark age: a phenomenon where a significant amount of important cultural heritage is lost due to both rapid digitization of information and lack of reliable long-term preservation methods.

Curating the Clinical: 20th Century Photographs and 21st Century Methods in Cape Town’s Medical Archive
Michaela Clark, University of Manchester; UCT Pathology Learning Centre, United Kingdom and South Africa
This paper offers an overview of a doctoral project conducted at the University of Cape Town’s Pathology Learning Centre. The project endeavored to assess, sort and study a collection of almost 6000 disused clinical photographs created by South Africa’s first medical school between 1920 and 1967. These records depict the bodies of local patients via various representational modes: frontal and profile images of heads, veiled limbs and torsos, disembodied organs, as well as x-rays are featured in black and white, mounted on cardboard backings, and annotated with details of both persons and signs of disease. Produced in urban hospitals in the temperate Cape, both the depictive practices and the diseases featured echo those of the metropole – all despite the highly unequal socio-political as well as clinical context of the South Africa at the time. When the project began in 2019, little information regarding the making, use, or original order of this collection existed. It was, for all intents and purposes, ‘orphaned’ in that it suffered from a lack of contextualising data, its origins remained uncertain, and its custody was fundamentally precarious. The collection thus had what Diane Vogt-O’Connor (2006:115) calls a ‘high maintenance mortgage’ – it posed too many problems and required too many resources to make it viable for research purposes. In addition, its uncertain provenance saw it plagued by ethical questions with regards to clinical consent (in the past) and research access (in the present). In order to grapple with these difficulties, the project applied a method of archival treatment to the collection that saw its contents systematically ‘curated’ (Voss 2012; Voss & Kane 2012). By framing the photographs as not only visual but as material culture (Edwards & Hart 2004; Caraffa 2018; Bärnighausen et al 2020), they were conterminously archived (assessed, organised, and described) and subjected to provenance research. This paper offers an overview of the theoretical and methodological concerns and solutions related to transforming these decontextualized settler-colonial clinical photographs into a historical resource. The process of object-management and the tracing of its ‘social life’ (Appadurai 1986) allowed for its historical roots and contemporary use to be unearthed. However, it also provided glimpses of the fraught racialized context of its production. In addition to outlining a means to render orphaned photographic collections of medicine useful to the historian, this paper thus further demonstrates the value of material of this kind for grappling with the representational politics and ethical complexities posed 20th-century clinical photographs in the settler-colonial context.

16:30 – 17:30 Tour of UCSF Library and Archives and Special Collections Exhibits and Collections

17:30 – 19:00 Social hour at Fireside Bar, 603 Irving St. San Francisco, CA 94122

Day two: June 6, 2024

8:00 – 9:00 Breakfast

Session five: Appraisal of Scientific Records

9:00 – 10:45

AI-Enhanced Appraisal and Processing of a Hybrid Collection
Margaret Cribbs, Michael Hucka, Tommy Keswick, Ian Roberts, Mariella Soprano, Richard Thai of California Institute of Technology (Caltech), United States
AI-enhanced appraisal and processing of a hybrid collection The Caltech Archives has recently accessioned materials from Robert H. Grubbs. Grubbs was a Caltech faculty member and Nobel Prize winner in Chemistry. This hybrid collection consists of over 100 carton-size boxes of papers and thousands of digital files and emails, some of which are stored on obsolete media. The large size of the collection and its diverse formats is forcing us to find more efficient and effective methods to describe them in ways that combine the digital and analog materials intellectually while reducing their physical footprint. The aim of this feasibility study is to find novel ways by which we can integrate the physical and digital materials, enhance their descriptions and organizations, and eliminate duplication. To achieve this, we plan on using modern text recognition and machine learning technologies. Our plan is to create an iterative workflow that begins with the goal of eliminating physical copies of materials that are duplicated in the set of digital materials. While processing the collection, archivists will utilize smartphone automation to photograph at least the first page of documents to be retained, storing them in a meaningful digital folder structure. Using software to extract text from the photographs, we can compare and search for each document within the corpus of the textual content of the digital documents. Once duplicate copies have been identified, we will know that we can safely remove the duplicate physical documents from the collection. The process will also involve developing policies and procedures for integrating pointers to the digital copies within the ArchivesSpace finding aid. Finally, another phase of the project will involve using machine learning algorithms to categorize the digital and photographed documents into an intellectually unified finding aid. We expect to learn and improve the process as it is undertaken and will report our findings. The workflow will certainly be iterated upon as we find ineffective steps or other problems with the plan. We hope to achieve the dual goals of minimizing the physical space required for the collection and minimizing the processing work required for multitudes of miscellaneous digital files.

Reappraising “Appraising the Records of Modern Science and Technology”
Jordon Steele, Johns Hopkins Applied Physics Laboratory, United States
Bethany G. Anderson, University of Illinois-Urbana Champaign, United States
Polina Ilieva, University of California San Francisco, United States
In 1985, Joan K. Haas, Helen Willa Samuels, and Barbara Trippel Simmons published authoritative guidelines for archivists working with science and technology records. Appraising the Records of Modern Science and Technology: A Guide was at the time a much-needed volume for archivists grappling with the exponential rate at which postwar science generated records of its activities. The Guide built on decades of work by archivists to develop appraisal guidelines for twentieth-century science’s voluminous records. However, the Guide sought to be broader in scope by taking into account personal and professional activities, the administration of science, and research and development in both academia and industry. It also sought to instill in archivists the importance of identifying and giving critical thought to the functions and activities resulting from scientific research and processes, linking it to broader theoretical appraisal discussions about functional analysis and documentation strategy associated with Samuels’s work. For decades the Guide has been a mainstay for archivists given its clear and comprehensive counsel. Yet the scientific enterprise has evolved in significant ways since 1985, including the emergence and evolution of big (digital) data, the open science movement, data management, citizen science, and #sciencetwitter and scientific exchange on social media. These changes have raised new questions for archivists about what to keep and what not to keep of contemporary science. In 2022, the presenters launched a research project to reexamine the Guide in light of these developments. Representing three distinct types of institutions that manage science and technology records–a research center, a health sciences university, and a science-intensive land-grant university–the archivist presenters to date have conducted gap analysis of the Guide to identify new directions and ways to build on this critical work in the twenty-first century; conducted structured interviews with scientists, engineers, and medical professionals at their respective institutions (University of Illinois, JHU/APL, and UCSF) to gain deeper insight into what scientific research looks like in the contemporary era and what records are created in support of that activity; and started the process of comparing our findings with the initial gap analysis. The team is eager to present on its research to date as a result of this work and facilitate a discussion with attendees who research the history of science and technology to get their reaction to our efforts to improve the documentation of science, technology, and engineering archives.

The Appraisal of Scientific Documentation/Information as a Methodological Operation: Application Criteria and Parameters
Fernanda Ribeiro and Armando Malheiro da Silva, University of Porto, Faculty of Arts and Humanities / CITCEM, Portugal
Evaluation in the context of information services has been carried out essentially in three distinct areas: the evaluation of services, the evaluation of information retrieval and the evaluation (appraisal) of information flow. This last aspect, especially applied to archives, aims to decide the destination of information after a few years of current use, seeking to make eliminations that considerably free up the information storage space. In this work, evaluation is approached not as a practical or merely technical procedure, but as a methodological operation applicable to information in any production and use context, within the framework of Information Science, an area in which we integrate Archivistics as an applied discipline. Appraisal does not, therefore, have an end in itself or an isolated use, disconnected from a broader Method that associates the comprehensive and explanatory (scientific) aspect with the applicational or interventional (technical) aspect, as it is appropriate in a discipline such as Information Science, naturally included in the field of applied Social Sciences. Appraisal is also presented as an alternative, adjusted to the demands of qualitative research, constituting a global approach, based on a constructive critique of positivism and relativism dictated by systemic and complex thinking. Starting from the appraisal model, developed at the University of Porto from a different and alternative perspective to the still dominant one, derived from the proposal of the North American T. R. Schellenberg, and already widely applied in archives of different nature, criteria and parameters are set out to, in an objective way, moving towards practical application, taking into account the life cycle of information, the renewal and obsolescence of knowledge and the importance of memory for the long-term preservation of information products. The model in question can be applied to archives and other information services that deal with scientific information, since the assumptions on which it is based were defined from a perspective that seeks to adjust it, in a generalized way, to any type of information, regardless of the context. in which it is generated, used and preserved.

The Robotics Project: Collecting Complex Science at Carnegie Mellon
Julia Corrin and Kathleen Donahoe, Carnegie Mellon University, United States
In this presentation, we will discuss the work of Carnegie Mellon’s Robotics Project, a collaboration between the University Archives, the School of Computer Science, and, more specifically, the Robotics Institute to develop a model for archiving the work of roboticists. The field of robotics is a uniquely challenging area to document. Does archiving robotics mean archiving robots? Research outputs might be a combination of physical objects, paper materials, code, and AV documentation, many of which present challenges to traditional archival practice. A singular robot can also be used for different research carried out in different disciplines, by different labs, sometimes at the same time, which challenges traditional definitions of provenance. Finally, as an iterative science, the practice of building robots often involves the adaptation or destruction of previous robots, raising questions about authenticity and how we can define the version of a robot. In addition to a high-level overview of the project, our presentation will focus on two main areas. First, is Multimodal Archives: A Toolkit for Collection Robotics and Other Complex Material in a Research Ecosystem. This toolkit attempts to answer the thorny questions outlined above and offer suggestions for those approaching similar work. The toolkit embraces interdisciplinary methods, including pre-custodial data collection borrowing from sociological practices, and post-custodial collecting borrowing from community archives. We will also look at the practical outcomes of the project, particularly the collections we’ve acquired and the types of material they’ve tended to include. After that, we will briefly review the Digital Robotics Archive and how the needs of roboticists have influenced some of its features. Finally, we will look at the next steps for the project. While this presentation will cover a great deal of ground in a very specific scientific field, we believe the findings of this project may be broadly useful for others working to document and archive different scientific endeavors, and will be useful as scientific research becomes increasingly complex.

10:45 – 11:00 Coffee/tea break

Session six: Silences and Gaps in Scientific Archives

11:00 – 12:00

Drawing Women into the Science Archive
Deepika S and Anjali Ramachandran, Archives at NCBS, National Centre for Biological Sciences-Tata Institute of Fundamental Research (NCBS-TIFR), India
Archives make real-time choices about whose material to bring in – but what if those choices end up reproducing structural inequalities in science, particularly when it comes to gender and caste? And how do we recognise and rectify this in order to ensure equity in representation in the archive? The Archives at the National Centre for Biological Sciences (NCBS), located in Bangalore, India, is a public collecting centre for the history of science in contemporary India. As the Archives at NCBS grows each year with archival donations from individuals, communities and organisations, its team has felt the need for an interdisciplinary archiving framework in order to ensure an inclusive approach to archiving, particularly when it comes to women and other marginalised groups. In our experience, the volume of archival material that we tend to receive from them is far smaller and not representative of their participation in science, or their contribution to science. We are working to expand our understanding of the archival record itself and how archival value is determined, and to reconsider conventional approaches to archiving that have not been sufficient to include marginalised groups that have played a vital role in science. This work is relevant not just for Diversity, Equity and Inclusion (DEI) initiatives in India, but in science archives across the world, given the international and collaborative nature of contemporary science. This presentation will explore the experiences of the authors and their colleagues at the Archives at NCBS in their efforts to spend a year accessioning material from women and underrepresented groups. This began with a conversation with our funders about accession targets, after several intense months of acquiring over 18 new collections bore only one collection from a woman scientist. To change the way that we approached accessions, we used a template that we developed for a sourcing matrix – a weighted index with criteria for gender, caste, race, or other axes of diversity – that fit into our archives’ accessions workflow. The presentation will also cover conversations that we have had with donors of archival material, and collaborations with scholars of Feminist Science Studies and Women’s Studies – and the impact that methodological tools borrowed from these disciplines have had on our accessions workflow.

Connections, Collaborations, Collections: Using Data Visualization to Rethink Access to Scientific Archives
David Ragnar Nelson, American Philosophical Society and Serenity Sutherland, SUNY Oswego, United States
Archives are curated. Scientific archives are no different. Papers often come from a single scientist’s estate and the archive chooses to keep the papers based on that scientist’s contributions to science. However, by reading through the archive researchers can access a broader field of actors, including fellow scientists, contingent faculty, research assistants, administrators, research subjects, and technologies. Through this process, researchers can sometimes capture the gaps and silences present in archival materials. Traditional finding aids and subject guides, which often only (re)present the collection’s physical arrangement, are ill-suited to help researchers navigate this complicated field and may serve to reinscribe myths of scientific genius and may mask the diversity contained in these collections. Additionally, by looking at a scientific collection in isolation, there is the risk of amplifying the gaps and silences it already contains. At the Library of the American Philosophical Society (APS), which houses rich collections in the biomedical, anthropological, physical, and mathematical sciences, initiatives are currently underway to rethink how scientific archives are presented. These initiatives seek not only to make the APS’s scientific collections more accessible to general audiences, but also to rethink how researchers get information about collections. As one example, this paper will discuss a recent effort, Visualizing Women in Science. This project uses Digital Humanities methods to mine the papers of five women scientists, including Nobel laureate Barbara McClintock, for lesser known women researchers. Using a network visualization, the project contextualizes the achievements of these famous scientists within a larger social web of scientific activity and shows connections between diverse parts of the archive. The project was featured in the 2023 exhibition at the APS Museum and provides fertile ground for thinking not only about how to represent archival collections differently, but also how to use these collections to tell new, accessible narratives about the history of science. The second part of this presentation will consider efforts to use the lessons learned from Visualizing Women in Science to create a new subject guide for the history of science at the APS. By using data visualization methods from the Digital Humanities and Linked Open Data, the APS hopes to enrich its information about these collections and facilitate use of the collections as a whole, rather than as the papers of individual scientists. Such efforts may also pave the way for greater integration of archival collections at other institutions.

12:00 – 13:15 Lunch and poster session

Transcribing Genius: Crafting Scientific Oral History Narratives at the Caltech Heritage Project
David Zierler, California Institute of Technology, United States
Sara Baum, Sharp Copy Transcription, United States
In this presentation, David Zierler, Director of the Caltech Heritage Project; and Sara Baum, Founder of Sharp Copy Transcription, will illuminate the valuable lessons learned from our unique collaboration between a historian of science and a transcription organization specializing in scientific oral history. Our collaborative efforts have been instrumental in supporting the mission of the Caltech Heritage Project, which is charged with curating historical materials; publishing oral history interviews with members of the Caltech community in partnership with the Caltech Archives; and producing a repository of institutional stories and presentations that elucidate the impact that Caltech and its people have had on society and the world and inform the Institute’s operations and steps forward. The session will delve into the challenges and opportunities inherent in transcribing the distinctive nature of scientific discourse. We will discuss efficient workflows, strategies, and collaborative efforts aimed at enhancing the transcription process. Emphasizing the nuanced skill set required for the task, we’ll highlight the significance of familiarity with scientific history, terminology, technical dialogue, theories and experiments, rendering of equations, linguistic and cultural backgrounds of narrators, and larger historical context in ensuring expertly crafted renderings of scientific oral narratives. Central to our discussion will be the exploration of transcription workflows tailored specifically to scientific oral histories. We will emphasize strategies for managing large volumes of data and content while prioritizing accuracy, coherence, and ethical practices. We will discuss our joint efforts with narrators to ensure that final transcripts authentically capture both the essence of conversations and the ways in which scientists from around the world wish to be represented in the oral history format. David Zierler will provide insights into the multifaceted utilization of Caltech Heritage Project oral history recordings and transcripts within Caltech. This discussion will highlight the increasingly significant role of Caltech Heritage Project recordings and transcripts in institutional communications, programming, initiatives, and events. Through our presentation, we aim to offer listeners a window into the intricacies of our collaborative effort at the intersection of oral history, transcription, and scientific archives. We hope to provide valuable insights and lessons for individuals engaged in collecting, preserving, and utilizing the rich stories of scientific discovery.

Bringing archival considerations to a machine-actionable Data Management Plan standard
Jennifer Cuffe, Library and Archives Canada, Canada
Research institutions and science-based government institutions may need to manage datasets – and be responsible for those datasets – for a very, very long time, much longer than the career of any individual researcher might last. Furthermore, the sheer number of datasets within large, long-standing institutions may encourage a systematic and automated approach to data management activities. Such an approach may distance datasets from individuals familiar with the original research program. Robust documentation is required for science data management in such contexts. Machine-actionable data management plans (maDMPs) hold promise as tools contributing to effective science data management at scale, and over a long data lifecycle. A Data Management Plan (DMP) consists of documentation outlining how research data will be managed during and after the research project. DMPs may take many forms, including, for instance, simple free-form prose. In a maDMP, information is recorded in structured way, according to a specification (metadata application profile), which enables the automated exchange of information recorded in the DMP. As such, maDMPs hold potential as data management tools consistent with open science principles and the pursuit of FAIR(ER) scientific data management done with CARE. This poster presents reflections on bringing archival considerations to an extension of the international maDMP common standard. The presentation draws from my personal experience as an archivist contributing to several working groups; the presentation does not reflect the perspectives of any institution or employer. As an archivist, before I could meaningfully engage with the existing international maDMP common standard (developed by the Research Data Alliance’s [RDA’s] DMP Common Standards Working Group), I had to learn about cardinalities, data models, and the international standard’s definitions. Only after that steep learning curve was it possible to think through how an extension might remain compatible with the international standard and also record, in a machine-actionable way, the additional information required for archiving activities as mandated in a government context. Certain archival considerations were brought into the extension as additional modules, for instance for: data retention and disposition requirements and actions; and considerations relevant to supporting indigenous data sovereignty. Other archival considerations were addressed through modelling. For instance, in our variant of a science data lifecycle model, we separated digital preservation activities from digital storage infrastructure. Working group discussions have also addressed common misconceptions, such as the misconception that if data is open, it is necessarily archived and preserved.

Opioid Industry Documents Archive reveals industry role in AI-enabled prescription drug monitoring programs
Dan Kabella, Kelly Ray Knight, Dorie Apollonio of University of California, San Francisco, United States
Halle Young, Joint University of California, Berkeley/University of California, San Francisco (UCB/UCSF) Medical Anthropology Program, United States
This poster focuses on the opioid industry’s role in the innovation and proliferation of AI-enabled Prescription Drug Monitoring Programs (PDMPs). PDMPs are statewide electronic databases which collect, track, and analyze sensitive prescription data about federally controlled substances. PDMP technology has grown more sophisticated, and AI was used to transform spreadsheets into interactive visualizations and predictive risk scores. As a result, AI analytical platforms like Bamboo Health’s NarxCare have been widely adopted into clinical workflows. The proprietary software contains complex statistical models and risk functions automatically generated from PDMP data that is translated into three-digit risk scores. This process facilitates their clinical use by augmenting decision making and proposing to identify patients at risk for overdose, diversion, and addiction. Technology providers like Bamboo Health have made PDMP data profitable by transforming the massive scale of prescription drug history data into a tool for “responsible” opioid prescribing at the point of care. They market their algorithm to private and public stakeholders even though its algorithmic risk-calculating formulas remain non-transparent (Buonora 2023). Consequently, people with complex medical conditions as well as racially and economically marginalized patients are vulnerable to disproportionate opioid risk profiling (Olivia 2022). PDMP’s systematic automation bias raises bioethical and health equity concerns. Due to difficulty in accessing industry internal practices, few studies focus on how these uses of PDMP data are perceived within the context of industry’s commercial influence on the practice of medicine. Our study investigates this question through unique public access to recently unveiled internal documents, a project of the University of California, San Francisco and Johns Hopkins University called the Opioid Industry Documents Archive (OIDA). OIDA contains over 3 million documents released through resolution of legal actions against opioid manufacturers and associated companies, including consultants and pharmacies. In ongoing litigation, pharmaceutical manufacturers have been held responsible for engaging in deceptive marketing while distributors and pharmacies have been accused of inundating communities with opioids, driving the opioid crisis. Internal industry documents reveal that the opioid industry facilitated the widespread use of proprietary and non-transparent prescribing algorithms that play a significant role in clinical decision making. OIDA represents a novel archival site where scholars can study the implications of industry’s visions and strategies for medicine, public health, and patient lives. It draws new insights into an understudied area of supply side interventions into PDMP use that can increase bias in health care, in the context of data-driven interventions and AI innovation.

Unconference

13:15 – 15:45

Unconference, or a participant-driven meeting. We will come together to identify an agenda and discussion topics focused on prominent themes derived from input collected during the first day and a half of the workshop.
We will share challenges, develop solutions, generate ideas, and build partnerships. All attendees are welcome and encouraged to join and contribute. You don’t need to submit a proposal to participate in the unconference.

15:45 – 16:00 Wrap up and closing remarks

16:00 – 17:30 Reception at UCSF Library, Lange Room