This is an old revision of the document!
Table of Contents
Basic Annotation Workflow
At this point, you may follow one of two paths:
- Use the Natural Language Processing (NLP) Service online
- Process using our NLP tools individually on your local machine
NLP Service Online Workflow
Copy your transcribed text into NLP Service. This service provides SGML output of your text in a format that is tokenized, normalized, and tagged for part-of-speech, lemma, and language of origin
Import the SGML into a spreadsheet.
Rename the existing layers according to the Annotation layer names guidelines. (Not all layers in the guidelines will exist in your file at this point.)
Proofread the tokenization of the bound groups.
- Add or delete rows if necessary to create additional tokens.
- You may wish to use Google Refine.
- WARNING: Since the spreadsheet now contains many annotation layers, be careful when adding or deleting rows:
- Make sure that the spans of the tok, norm, and at least one of the group layers are all accurately aligned.
- Make sure that any annotation layers (e.g., hi@rend, gap@, etc.) still annotate the correct token(s)
- Make sure that the pos, lemma, and lang layers are accurately aligned
Create an original text ("orig") layer
Create an original text in bound groups ("orig_group") layer
Proofread the normalized (norm) layer.
- You may wish to use Google Refine.
- You do not need to simultaneously proofread the norm_group layer; we can reconstruct norm_group using the data in norm.)
Reconstruct the norm_group layer.
Proofread the part of speech (pos) and lemma (lemma) layers.
Annotate for morphs
You now have an Excel file with tokenized morphemes aligned with bound groups, normalized morphemes. (If you are working with a Sahidica document, you may have translations and verses as well; with a diplomatic transcription line breaks and column breaks and other manuscript annotations are aligned.)
Proofread the tokenization of the bound groups. Add or delete rows if necessary. You may wish to use Google Refine.
Create a normalized bound group layer