annotating_sub-word_morphemes
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
annotating_sub-word_morphemes [2015/09/09 06:42] – created ctschroeder | annotating_sub-word_morphemes [2015/10/14 12:11] (current) – admin | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | === Language of origin tagger | + | === Create a morph layer === |
- | Language | + | There are multiple ways of creating a morph layer. |
- | tag the ignore:morph layer | + | == Follow these steps if you have no morph layer at all == |
- | The language tagger annotates loan words for language of origin (Greek, Hebrew, Aramaic, Latin, etc.) \\ | + | Duplicate |
- | Note: While we recognize that some of these words (especially in biblical texts) are sometimes called Greco-Hebrew, | + | |
- | To tag text for language of origin you will need two files: | + | |
- | Move the normalized text file into the folder with the language of origin tagger.\\ | + | |
- | Mac users: | + | |
- | Open a terminal window in that directory or change the directory of your terminal window.\\ \\ | + | |
- | Type the command perl _enrich.pl AP.006.nau196.norm.txt > AP.006.nau196.lang.txt.\\ \\ | + | |
- | Open the new file. Select all and copy. Paste into the Excel file. Delete the extra norm column. | + | |
- | Use [[http://wiki.copticscriptorium.org/ | + | Manually or [[google_refine|using Google Refine]] identify the normalized words that need to be annotated on the morph level. |
+ | |||
+ | Split the words you have identified into the requisite number of tokens. | ||
+ | * Be sure to break up the words in the **tok** and the **ignore: | ||
+ | * Ensure | ||
+ | * Do not break up the words in the norm or orig layers; only the tok and ignore: | ||
+ | |||
+ | Consider using [[google_refine|Google Refine]] | ||
+ | |||
+ | Complete the steps in the next section to create the morph layer. | ||
+ | |||
+ | == Follow these steps if/when your file has an ignore:morph layer == | ||
+ | |||
+ | Many of these steps are demonstrated in this [[https:// | ||
+ | |||
+ | You need to create a clean morph layer that has only unique data in it; 80-90% of the data in ignore: | ||
+ | |||
+ | 1. Insert a new column for the morph layer but it should be empty (as in the video) | ||
+ | |||
+ | 2. In the first cell of data, type in a conditional function that will look to see if the ignore: | ||
+ | =IF(E2=F2,"", | ||
+ | where E2 is the norm layer and F2 is the ignore: | ||
+ | |||
+ | 3. Select the cell with your formula in it and select the rest of the column down to the end of the layer data. Use the “Edit> | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |
annotating_sub-word_morphemes.1441802570.txt.gz · Last modified: 2015/09/09 06:42 by ctschroeder