Event
South Asia Studies Digital Humanities Workshop
Optical Character Recognition (OCR) & Handwritten Text Recognition (HTR)
Andrew Ollett, Associate Professor in the Department of South Asian Languages and Civilizations at the University of Chicago
Registration for the "South Asia Studies Digital Humanities Workshop: Optical Character Recognition (OCR) & Handwritten Text Recognition (HTR)" is now open. The workshop will be held from 9:15am - 5:00pm on October 10 and from 9:30am - 4:00pm on October 11.
This event includes a public program of talks and hands-on workshop sessions. We warmly invite scholars of South Asia to apply for the hands-on-workshop sessions here.
The program of public talks brings South Asia studies scholars, digital humanities specialists, data librarians, and manuscript studies curators into conversation. Andrew Ollett, Associate Professor in the Department of South Asian Languages and Civilizations at the University of Chicago, will deliver the keynote address, "Texts as Data: Tools and Perspectives for South Asianists."
The hands-on workshop sessions will introduce participants to digital tools for Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR). OCR and HTR make it possible to turn archival sources into searchable texts. By the end of this workshop participants will be able to:
- Prepare a Multilingual text for OCR and HTR
- Use Google Cloud Vision and Python to perform text recognition and extract the data as a searchable text file
- Use Python to transliterate South Asian scripts into Roman script
- Perform a simple text search with grep, Google Pinpoint, and visualize text data with Voyant Tools
- Conceptualize projects and research questions that utilize computational text mining and analysis
Participants in the hands-on sessions must be prepared to attend both days of the workshop as well as a prior software installation session.
If you have any questions feel free to reach out to Kashi Gomez (kashig@sas.upenn.edu)
This workshop is sponsored by the Department of South Asia Studies, Research Data and Digital Scholarship's AI Literacy Interest Group, the Schoenberg Institute for Manuscript Studies, The Penn South Asia Center, the Price Lab for Digital Humanities, and the Wolf Humanities Center.