Table of Contents

Part of speech tagging

The norm layer will be tagged with the tree-tagger pos-tagger. (The morph layer will be tagged by the language of origin tagger.)

To tag for part of speech, you will need the Tree-tagger package and our training courpus

Select the norm column, copy it, paste it into a text file. Make sure the text file is using Unix returns and is in UTF-8. Save the new text file in whichever directory has the tree-tagger script (for Macs, the tree-tagger-MacOSX-3.2-intel/bin/ directory). Make sure the coptic_fine.par file in the larger tree-tagger folder is also in the “bin” folder. Open a terminal window at that directory. Run the tree-tagger. (E.g., type something like ./tree-tagger coptic_fine.par -token inputFileName outputFileName). Open the outputFileName. Copy and paste the data into empty columns in the Excel file.

If you have an additional morph annotation layer, the tree-tagger will not have respected the resulting spans in the norm layer. You will need to search your ORIGINAL norm layer for spans and make sure they are aligned properly with the tagger. You can look for this manually in a variety of ways. One example of manual correction of the data is here see video here. Watch the video before running the tagger.

You can do this without manual proofreading by pre-processing your data before running the tagger in the following way:

You can use Google Refine’s facet search to check your pos-tags, also.