Edith Escobedo
Edith Escobedo
Edith is a Project Archivist for Archives and Special Collections. Contact Edith with questions about UCSF’s physical archives, digital collections, or with research questions pertaining to archival materials.

How to Digitize 68,000 Pages of Documents

This is a guest post by Digitization Coordinator at the UC Merced Library, Heather Wagner

The UC Merced Library’s Digital Curation and Scholarship unit was tasked with digitizing 68,000 pages of documents for the Pioneering Child Studies Project. So, how do we go about digitizing 68,000 pages of documents? With some help. That help comes from four UC Merced undergraduate student assistants who play an important part in the digitization process.

The first part of the process is the actual digitization. Our undergraduate student assistants digitize materials on a variety of equipment. This includes high speed document scanners and flatbed scanners for documents, book scanners for bound material, and cameras on stands for oversize or fragile materials.

UC Merced student Nicolas Fleming, digitizing bound materials using a book scanner.

Once the digitization is complete, the next step is quality checking. Students review each image in Adobe Bridge and zoom in to check for issues such as unintended lines in scans, or items that are out of focus. Some images may need minor editing such as straightening and cropping. Edits are completed during the quality checking step in Photoshop. Quality checking is time consuming but necessary to ensure we are sure we are receiving the best possible results from digitization.

UC Merced student Dathan Hansell, quality checking digitized documents.

PDFs with optical character recognition (OCR) are created from the digitized image files so they are accessible to users. OCR makes the PDF document searchable. Students quality check the PDF documents and then optimize them. Optimizing the PDF files reduces their file size, which makes them better suited for web viewing. The files are then ready for uploading.

We appreciate the hard work of our undergraduate student assistants. We would not be able to complete digitization projects of this size without them.

More about the project

The digitization project is part of the NHPRC grant, Pioneering Child Studies: Digitizing and Providing Access to Collection of Women Physicians who Spearheaded Behavioral and Developmental Pediatrics. The UCSF Archives is working in partnership with UC Merced Library’s Digital Assets Unit towards the goal of digitizing and publishing 68,000 pages from the collections of Drs. Hulda Evelyn Thelander, Helen Fahl Gofman, Selma Fraiberg, Leona Mayer Bayer , and Ms. Carol Hardgrove.

Feature image courtesy of Kindel Media via Pexels.