basic_annotation_workflow
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
basic_annotation_workflow [2015/09/09 08:43] – ctschroeder | basic_annotation_workflow [2020/08/03 16:56] – admin | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | === Basic Annotation Workflow === | + | ==== Basic Annotation Workflow |
- | [[transcribe_a_text|Text file]] | + | === Transcribe your text === |
+ | |||
+ | Transcribe your text in [[gitdox_workflow|GitDox]]. Alternatively, | ||
+ | |||
+ | At this point, you may follow one of two paths: | ||
+ | - Use the Natural Language Processing (NLP) Service online | ||
+ | - Process using our NLP tools individually on your local machine | ||
+ | |||
+ | These two paths are outlined in detail in the following sections. | ||
+ | |||
+ | === NLP Service Online Workflow === | ||
+ | |||
+ | [[natural_language_processing_service_online|Run the NLP Service]] on your transcribed text in GitDox. | ||
+ | * If your text is in a text file, copy and paste it into the GitDox text editor. (See the [[gitdox_workflow|GitDox]] page for more information on using the GitDox text editor.) | ||
+ | * If your text is already transcribed into the GitDox text editor and validated (see [[gitdox_workflow|GitDox]]), | ||
+ | |||
+ | You will see an NLP button below the text window. Click it. | ||
+ | |||
+ | //Note for veteran GitDox users: you do not need to proofread tokenization as part of the NLP Service process. The NLP service works better, now, without tokenizing first.// | ||
+ | |||
+ | [[import_macro|Import the SGML into a spreadsheet.]] | ||
+ | |||
+ | Rename the existing layers according to the [[annotation_layer_names|Annotation layer names guidelines]]. (Not all layers in the guidelines will exist in your file at this point.) | ||
+ | |||
+ | Remove any redundant columns. These may be hi (keep hi@rend); supplied (keep supplied@reason etc.); gap (keep gap@reason etc.). | ||
+ | |||
+ | Add missing information to existing layers. For instance, replace lb and cb placeholders in lb@n and cb@n columns with line and column numbers from original manuscript. | ||
+ | |||
+ | Note: the following steps are a guide to the kinds of work you will be doing. | ||
+ | |||
+ | [[create_a_normalized_bound_group_layer|Create an original text (" | ||
+ | |||
+ | Create a new or clean up an existing layer for [[create_a_normalized_bound_group_layer|original text in bound groups (" | ||
+ | |||
+ | Proofread the normalized (norm) layer. | ||
+ | * You may wish to use [[Google Refine]]. | ||
+ | * You do not need to simultaneously proofread the norm_group layer; we can reconstruct norm_group using the data in norm.) | ||
+ | |||
+ | [[create_a_normalized_bound_group_layer|Reconstruct the norm_group layer]]. | ||
+ | |||
+ | Proofread the part of speech (pos), lemma (lemma), and morpheme (morph) layers. Part of speech and lemma are annotated on the norm level. | ||
+ | |||
+ | Proofread the language of origin (lang) layer. | ||
+ | * You may wish to use [[Google Refine]]. | ||
+ | * Coptic SCRIPTORIUM annotates for language of origin on the **morph** level not the **word (norm)** level. | ||
+ | |||
+ | Add translation, | ||
+ | |||
+ | Add [[Metadata]]. | ||
+ | |||
+ | Validate the file using the [[https:// | ||
+ | |||
+ | === Process using our NLP tools individually on your local machine === | ||
[[tokenizer|Tokenizer]] | [[tokenizer|Tokenizer]] | ||
Line 19: | Line 71: | ||
[[Ensuring orig and norm layers are the same span]] | [[Ensuring orig and norm layers are the same span]] | ||
- | [[part_of_speech_tagging_using_tree-tagger|Part of speech tagging]] | + | [[part_of_speech_tagging_using_tree-tagger|Part of speech tagging |
[[language_of_origin_tagging|Language of origin tagger]] | [[language_of_origin_tagging|Language of origin tagger]] | ||
Line 25: | Line 77: | ||
[[Metadata]] | [[Metadata]] | ||
+ | Validate the file using the [[https:// |
basic_annotation_workflow.txt · Last modified: 2020/08/03 18:08 by admin