Vast amounts of data are trapped in non-computable formats, such as document image scans and text. Deep learning has the potential to greatly expand the questions that economists can study by providing rigorous methods for converting non-computable information into structured, computable data. Combined with advances in GPU compute and inexpensive cloud compute, this makes it feasible to process data on a massive scale.
In this Quantitative History Webinar, Melissa Dell of Harvard University will provide an overview of her recent work to develop deep learning methods and tools for creating computable social science data, with an aim of making structured digital data more representative of documentary history. This work emphasizes lower resource contexts - for which there are few incentives for commercial technology – and encompasses novel approaches and tools for document layout analysis, OCR, and NLP pipelines.