Machine Learning Tools for Historical Documents

The goal is to devise algorithms that will be employed in the development of a tool suite for the analysis of historical texts.
Such a computational humanities toolbox for textual research will include components for word and letter spotting in images of manuscripts, alignment of transcriptions words with words in images, tools for paleographic analysis, a tool for reconstruction of the text in a manuscript, and methods to learn orthographic and lexical variations from parallel texts.
The tools we will be working on will be made freely available and will impact textual studies across the board, including those in classical and European languages.


Nachum Dershowitz is an Israeli computer scientist, known e.g. for the Dershowitz–Manna ordering used to prove termination of term rewrite systems.

He obtained his B.Sc. summa cum laude in 1974 in Computer Science–Applied Mathematics from Bar-Ilan University, and his Ph.D. in 1979 in Applied Mathematics from the Weizmann Institute of Science. Since 1978, he worked at Department of Computer Science of the University of Illinois at Urbana-Champaign, until he became a full professor of the Tel Aviv University (School of Computer Science) in 1998. He was a guest researcher at Weizmann Institute, INRIA, ENS Cachan, Microsoft Research, and the universities of Stanford, Paris, Jerusalem, Chicago, and Beijing,.[2]