Ariel Deardorff
Ariel is the Director of Data Science and Open Scholarship. Contact Ariel for help with data sharing, data management, reproducibility, and open science.

Data Science and Open Scholarship 2025 Year in Review

2025 marked a year of growth and momentum for the Data Science and Open Scholarship team. Our foundational programming workshops catering to University of California, San Francisco (UCSF) students, staff, faculty, and postdocs, were as busy as ever. Interest in generative AI soared, driving a surge in requests for instruction and consultations. At the same time, we responded to increasing demand for presentations on the new National Institute of Health (NIH) public access policy and the evolving landscape of federal research data.  

As the year draws to a close, we are proud to look back on these milestones and highlight key accomplishments. 

Teaching and workshops

We advance reproducible science at UCSF by equipping researchers with essential computational skills. Through our foundational workshops, we provide students, faculty, staff, and postdocs with the tools and practices they need for rigorous and transparent research. Our team delivered 58 library workshops, reaching over 1,500 members of the UCSF community. 

  • Data and Document Analysis with SQL and Python, a 10-part series taught by Data Science Specialist, Geoffrey Boushey in the spring and fall. This series continues to evolve as AI tools provide new methods for analysis, information gathering, and coding. In addition to preparing code samples in advance, Geoffrey guides workshop participants through creating, running, and evaluating their own code samples using a series of generative AI prompts.  
  • A Slower Introduction to R, a seven-part series offered by Data Science Specialist, Yea-Hung Chen, in winter and fall. The series, which leans on real-world tasks such as preparing and analyzing data, earned consistent praise and appreciation from learners. 
A density plot of systolic blood pressure by age group.
  • Data Manipulation in R, a follow-up to A Slower Introduction to R, this four-part intermediate-level series led by Yea-Hung Chen focused on data management and cleaning using R. 
Graphic of a toolkit and cogs advertising the AI in Your Toolkit Series
  • AI in Your Toolkit, a six-part workshop series designed by Yea-Hung Chen, Director of Data Science and Open Scholarship, Ariel Deardorff, and Clinical Research Librarian, Eileen Chen, in collaboration with colleagues from across the UCSF Library. The first session, on the environmental impact of AI, attracted over 40 UCSF community members and generated a lively discussion. We look forward to continuing this series with five additional sessions in 2026. 

Presentations

We actively engage with the UCSF community and beyond by presenting at departmental meetings, grand rounds, and conferences on timely topics in data science and scholarly communication. In 2025, our team delivered 21 presentations covering a range of subjects from publisher policies on generative AI to ethical data sharing. Some of these presentations include: 

  • Generative AI: Librarian Practices & Perspectives co-presented by Head of Scholarly Communication, Anneliese Taylor and Eileen Chen to over 100 UCSF students, faculty, and staff at the Generative AI Office Hours (VPN required to access the link) 
Five matches each burned more than the last and the text Tobacco.It's About a Billion Lives Worldwide
  • Leveraging AI for Document Analysis in Archival Research and Publishing presented by Geoffrey Boushey at the It’s About a Billion Lives tobacco symposium. Using documents archived within the UCSF Industry Documents Library such as vaping ads, public health service announcements, and handwritten letters, Geoffrey demonstrated how embedded text detection, video speech-to-text transcription, object identification and labeling, document classification and sentiment analysis, and generative AI tools can help science researchers and practitioners access information from large media collections.

Collaborations

To build excitement and inclusive learning opportunities in data science and open research, we regularly collaborate with groups across UCSF, the Bay Area, and University of California (UC) system to provide trainings and share information. Key collaborations include:

UC Love Data Week

We worked with data librarians from across the UC system to co-host the fifth annual UC Love Data Week, a week-long event featuring presentations and workshops on data access, management, security, sharing, and preservation. Geoffrey Boushey presented on Unlocking image, audio, and video data in the Industry Documents Library.  

UC Carpentries

We co-hosted and contributed to the annual UC Carpentries workshop series, which offers introductory R, Python, SQL, and Unix programming to students, faculty, and staff from across the UC system. This year, over 500 people participated in one or more workshops. 

Bay Area Open Science Group

The Bay Area Open Science Group is a longstanding collaboration between UCSF, UC Berkeley, and Stanford University to host eight sessions for our community of students, faculty, and staff.

This year’s most popular session featured a presentation on Rescuing Federal Research Data co-presented by UCSF’s Ariel Deardorff and Yea-Hung Chen. 

Generative AI

To create more opportunities for the UCSF community to learn about and engage with AI we participated in several campus-wide training committees and initiatives including:

  • AI Guild
  • GenAI Office Hours planning committee
  • ai.ucsf.edu editorial board

Projects and reports

In addition to our teaching efforts, our team advanced several data science and scholarly communication initiatives at UCSF and explored new topics to build our team’s capacity. 

  • Yea-Hung Chen used generative AI to try identifying interesting data visualizations in several of the Library’s collections, including the Helen F. Gofman papers and the Dritz papers. His experimental effort used the Granite Vision model, a model designed to understand tables and data visualizations. He found the AI model fairly accurate in flagging data visualizations, but further work is needed to optimize the process.
Handwritten tallies of Wide Range Achievement Test (WRAT) scores on a sheet of three-hole punched paper.
  • Geoffrey Boushey used UCSF’s generative AI platform Versa to identify tobacco-related imagery in mainstream movies and television shows, including VoltMan from Smokeless Image, and built a Versa Chatbot to query and interact with depositions of tobacco and opioid executives. 
A still image of an open comic book from a cartoon advertisement for ecigarettes featuring an orange superhero.
  • Anneliese Taylor contributed to a joint UCSF Academic Senate and UCSF Library feasibility report examining the establishment of a central UCSF writing center.

Looking ahead

In the coming year, we remain committed to advancing data science and open scholarship through teaching, consulting, and presentations. Our team will deliver the remaining sessions of the AI in Your Toolkit workshop series, continue to offer our foundational programming workshops, and develop new offerings to meet emerging needs. We welcome opportunities to collaborate through presentations and consultations, and encourage the UCSF community to explore upcoming events on the library calendar.