Table of Contents

Language of origin tagger

The language tagger annotates loan words for language of origin (Greek, Hebrew, Aramaic, Latin, etc.)

The language of origin tagger tags on the morph level. (Note: In Coptic SCRIPTORIUM we create a layer specifically for tagging we usually name the ignore:morph layer. since our morph layer of annotations only contains those words annotated on this level and does not contain all the words that are not further annotated on this level.)

While we recognize that some of these words (especially in biblical texts) are sometimes called Greco-Hebrew, Coptic SCRIPTORIUM annotates for the earliest language of origin. Deciding between Hebrew and Greco-Hebrew, Aramaic and Greco-Aramaic, etc., leads to more discrepancies between editors or annotators. Researchers searching for loanwords in ANNIS may wish to search for multiple languages in order to find all the “hits” they need.

To tag text for language of origin you will need two files: the lexicon file for language of origin (lexicon.txt), and the script (_enrich.pl). These files need to be in the same directory.