Yale Digitizes Documents

Two recent federal grants will allow Yale to digitize rare primary sources of Middle Eastern history, making them accessible to researchers worldwide.

The Yale University Library has received a $650,000 four-year grant from the U.S. Department of Education to digitize Syrian and Palestinian government records, and a $240,000 joint grant from the National Endowment for the Humanities and the Joint Information Systems Committee to digitize Middle Eastern scholarly materials, according to a press release Thursday. The library will use advanced technology to translate the digitized text into searchable text, which will be available online.

“Both grants are ‘building block’ projects,” Associate University Librarian Ann Okerson said in an e-mail. “By doing them we will provide standards and infrastructure, linking and collaborative opportunities for other libraries.”

The government records will provide researchers worldwide with the first accounts of what happened from 1919 to 1948, a tumultuous time of political changes in the Middle East, said Simon Samoeil, Curator of the Yale Library’s Near East Collection. The records from British Mandate Palestine, irregularly published due to political instability, are on 40 microfilm reels and five supplementary print volumes. The Syrian collection, in printed format, is the only complete original copy in the U.S. and one of five known in the world. The digital copies will be available within four to five years in the Arabic and Middle Eastern Electronic Library (AMEEL) repository, a collaborative project between Yale and other libraries.

The Library will share the second grant with the University of London’s School of Oriental and African Studies (SOAS). The two will collaborate to digitize manuscripts, manuscript catalogs and dictionaries, which researchers will be able access for free online. The manuscripts, selected by and from Yale and SOAS holdings, will center on Arabic medical, scientific and philosophical works.

Yale will digitize the government records using optical character recognition (OCR) for Arabic text, a technology that translates scanned images of text into machine-editable text, making the material searchable via a web interface, said Beatrice Gruendler, a professor of Arabic. Gruendler added that using OCR with Arabic may result in machine errors because the script expresses vowels through dots added to each of the 22 consonants.

“Arabic writing is very homogenous, which makes it very sleek and beautiful, but it needs additional markers to remove ambiguity,” Gruendler said. She suggested that researchers compare the text with a published scholarly, or critical, edition of the work.

The rarer manuscripts will not leave the Sterling Memorial Library and must be digitized on site, but items that are largely routine will be outsourced to a company under a contract.

“This and all of the earlier projects we’ve done at Yale was to create an electronic resource for Arabic and Middle Eastern studies,” said Samoeil, who had spearheaded current and past initiatives. “Which will make all the information and the texts available for researchers all over the world for free.”

The Yale Library collaborated with several other libraries across the world in its AMEEL project to create an electronic library about the Middle East. AMEEL was sponsored by the Department of Education for four years, and the funding period ended last month.

In 2005, AMEEL received its funding from the Department of Education.

The Kirtas Company scanning machines, left over from a book digitization deal with Microsoft, may be used if they are gentle enough, Okerson said.

In 2007, Yale contracted Kirtas to scan about 100,000 volumes unique to Yale in the multi-million dollar Microsoft project, but when the software giant unexpectedly ceased funding last spring, the contract with Kirtas was terminated.

See more on this Topic