|
Search Full Text of Tobacco Documents
Demonstration Site Now Available |
||
|
In September 2005, the UCSF Library announced the Legacy Tobacco Documents Library Full Text Demonstration (FTD) site. The FTD provides a basic search interface for the full text of virtually all of the 7 million documents (40+ million pages) in the Legacy Tobacco Documents Library (LTDL).
Since the LTDL was launched in January 2002, users have been able to search metadata associated with the documents (e.g., title, author, date) but have not been able to search the full text within the document pages. In 2004, with funding from the National Cancer Institute, the UCSF Library's Center for Knowledge Management embarked on a project to extract text from the document images using optical character recognition software. Utilizing idle workstations in a student computer lab, the team developed an automated method to "OCR" all 40+ million pages in the LTDL and then generate 7 million searchable PDF documents. The Library has released this demonstration site to provide broad public access to the full text of the documents as soon as possible. The project's next phase will entail creating a full-featured research site. For more information, contact the Tobacco Control Archives staff. [back to top] |
||


