annotating_sub-word_morphemes
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revisionLast revisionBoth sides next revision | ||
annotating_sub-word_morphemes [2015/09/09 06:42] – created ctschroeder | annotating_sub-word_morphemes [2015/09/09 07:18] – ctschroeder | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | === Language of origin tagger | + | === Create a morph layer === |
- | Language | + | Many of these steps are demonstrated in this [[https:// |
- | tag the ignore:morph layer | + | You need to create a clean morph layer that has only unique data in it; 80-90% of the data is identical to the data in norm, making it difficult for a human to see when you’ve got compound words or morphs. |
- | The language tagger annotates loan words for language of origin (Greek, Hebrew, Aramaic, Latin, etc.) \\ | + | 1. Insert a new column |
- | Note: While we recognize that some of these words (especially in biblical texts) are sometimes called Greco-Hebrew, | + | |
- | To tag text for language of origin you will need two files: | + | |
- | Move the normalized text file into the folder with the language of origin tagger.\\ | + | |
- | Mac users: | + | |
- | Open a terminal window in that directory or change the directory of your terminal window.\\ \\ | + | |
- | Type the command perl _enrich.pl AP.006.nau196.norm.txt > AP.006.nau196.lang.txt.\\ \\ | + | |
- | Open the new file. Select all and copy. Paste into the Excel file. Delete the extra norm column. | + | |
- | Use [[http:// | + | 2. In the first cell of data, type in a conditional function that will look to see if the ignore: |
+ | =IF(E2=F2,"", | ||
+ | where E2 is the norm layer and F2 is the ignore: | ||
+ | |||
+ | 3. Select the cell with your formula in it and select the rest of the column down to the end of the layer data. Use the “Edit> | ||
+ | |
annotating_sub-word_morphemes.txt · Last modified: 2015/10/14 12:11 by admin