Differences

This shows you the differences between two versions of the page.

--- part_of_speech_tagging_using_tree-tagger [2015/09/09 06:16] – created ctschroeder
+++ part_of_speech_tagging_using_tree-tagger [2015/09/09 06:30] (current) – ctschroeder
@@ Line 1: / Line 1: @@
 === Part of speech tagging ===
-The norm layer will be tagged with the tree-tagger pos-tagger. To tag for part of speech, you will need the [[ http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|Tree-tagger package]] and our [[https://github.com/CopticScriptorium/tagger-part-of-speech/releases/tag/v1.6|training courpus]]
+The norm layer will be tagged with the tree-tagger pos-tagger. (The morph layer will be tagged by the language of origin tagger.)
-The ignore:morph layer will be tagged by the language of origin tagger.
+To tag for part of speech, you will need the [[ http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|Tree-tagger package]] and our [[https://github.com/CopticScriptorium/tagger-part-of-speech/releases/latest|training courpus]]
-  *tree-tagger pos-tagger
 Select the norm column, copy it, paste it into a text file.   Make sure the text file is using Unix returns and is in UTF-8.
 Save the new text file in whichever directory has the tree-tagger script (for Macs, the tree-tagger-MacOSX-3.2-intel/bin/ directory). Make sure the coptic_fine.par file in the larger tree-tagger folder is also in the "bin" folder.
 Open a terminal window at that directory.
-Run the tree-tagger.  (E.g., type something like ./tree-tagger coptic_fine5.par -token inputFileName outputFileName).
+Run the tree-tagger.  (E.g., type something like ./tree-tagger coptic_fine.par -token inputFileName outputFileName).
 Open the outputFileName.
 Copy and paste the data into empty columns in the Excel file.
-If you have morphs, the tree-tagger will not have respected the spans in the norm layer.  You will need to search your ORIGINAL norm layer for spans and make sure they are aligned properly with the tagger.  You can look for this manually in a variety of ways.  One example of manual correction is here [[https://www.youtube.com/watch?v=KjkfZ76zJfk&feature=youtu.be|see video here]]
+If you have an additional morph annotation layer, the tree-tagger will not have respected the resulting spans in the norm layer.  You will need to search your ORIGINAL norm layer for spans and make sure they are aligned properly with the tagger.  You can look for this manually in a variety of ways.  One example of manual correction of the data is here [[https://www.youtube.com/watch?v=KjkfZ76zJfk&feature=youtu.be|see video here]].  Watch the video before running the tagger.
   *select the ORIGINAL norm column (not the one you just pasted in; to be safe, you might rename the new one ignore:norm or something like that).
-  *Click the “unmerge cells” button to unmerge the spans.”
+  *Click the “unmerge cells” button to unmerge the spans.
   *Using the Find function, find the next empty cell.  (If the norm layer is selected, it will only find empty cells in that column.)
   *In the norm column, select the empty cell and the cell above it; merge the two cells.
@@ Line 23: / Line 21: @@
   *select ignore:norm and delete the column.
-You can do this without manual proofreading by:
+You can do this without manual proofreading by pre-processing your data before running the tagger in the following way:
-Make a copy of norm in a new sheet
+  *Make a copy of norm in a new sheet
   *Unmerge all spans
   *Add a new column with a serial ID (1,2,3…)  (select the content of the new column, go to edit>fill>series in the menu bar. (a) when selecting content, make sure just to use as many rows as have data (not the whole column); (b) Put a "1" in the first cell before going to Edit>fill>serie