Coptic SCRIPTORIUM Wiki

This is an old revision of the document!

Add or delete rows if necessary to create additional tokens.
You may wish to use Google Refine.
WARNING: Since the spreadsheet now contains many annotation layers, be careful when adding or deleting rows:
- Make sure that the spans of the tok, norm, and at least one of the group layers are all accurately aligned.
- Make sure that any annotation layers (e.g., hi@rend, gap@, etc.) still annotate the correct token(s)
- Make sure that the pos, lemma, and lang layers are accurately aligned

Create an "orig" layer

Create an "orig_group" layer

Proofread the norm layer. (You may wish to use Google Refine.)

Reconstruct

Proofread
Annotating sub-word morphemes

Tokenizer

Import into spreadsheet

You now have an Excel file with tokenized morphemes aligned with bound groups, normalized morphemes. (If you are working with a Sahidica document, you may have translations and verses as well; with a diplomatic transcription line breaks and column breaks and other manuscript annotations are aligned.)

Proofread the tokenization of the bound groups. Add or delete rows if necessary. You may wish to use Google Refine.

Normalization

Create a normalized bound group layer

Create a morph layer

Ensuring orig and norm layers are the same span

Part of speech tagging

Language of origin tagger

Metadata

Coptic SCRIPTORIUM Wiki

Table of Contents

Basic Annotation Workflow

NLP Service Online Workflow