basic_annotation_workflow
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
basic_annotation_workflow [2015/10/14 12:17] – admin | basic_annotation_workflow [2020/08/03 16:56] – admin | ||
---|---|---|---|
Line 1: | Line 1: | ||
==== Basic Annotation Workflow ==== | ==== Basic Annotation Workflow ==== | ||
- | [[transcribe_a_text|Text file]] | + | === Transcribe your text === |
+ | |||
+ | Transcribe your text in [[gitdox_workflow|GitDox]]. Alternatively, | ||
At this point, you may follow one of two paths: | At this point, you may follow one of two paths: | ||
Line 11: | Line 13: | ||
=== NLP Service Online Workflow === | === NLP Service Online Workflow === | ||
- | Copy your transcribed text into [[https:// | + | [[natural_language_processing_service_online|Run the NLP Service]] |
+ | * If your text is in a text file, copy and paste it into the GitDox text editor. (See the [[gitdox_workflow|GitDox]] page for more information on using the GitDox text editor.) | ||
+ | * If your text is already transcribed into the GitDox text editor and validated (see [[gitdox_workflow|GitDox]]), | ||
+ | |||
+ | You will see an NLP button below the text window. Click it. | ||
+ | |||
+ | //Note for veteran GitDox users: you do not need to proofread tokenization as part of the NLP Service process. The NLP service works better, now, without tokenizing first.// | ||
[[import_macro|Import the SGML into a spreadsheet.]] | [[import_macro|Import the SGML into a spreadsheet.]] | ||
Line 17: | Line 25: | ||
Rename the existing layers according to the [[annotation_layer_names|Annotation layer names guidelines]]. (Not all layers in the guidelines will exist in your file at this point.) | Rename the existing layers according to the [[annotation_layer_names|Annotation layer names guidelines]]. (Not all layers in the guidelines will exist in your file at this point.) | ||
- | Proofread the tokenization of the bound groups. | + | Remove any redundant columns. These may be hi (keep hi@rend); supplied (keep supplied@reason |
- | * Add or delete rows if necessary to create additional tokens. | + | |
- | * You may wish to use [[Google Refine]]. | + | |
- | * WARNING: Since the spreadsheet now contains many annotation layers, | + | |
- | * Make sure that the spans of the tok, norm, and at least one of the group layers are all accurately aligned. | + | |
- | * Make sure that any annotation layers | + | |
- | * Make sure that the pos, lemma, and lang layers are accurately aligned | + | |
- | [[create_a_normalized_bound_group_layer|Create an original | + | Add missing information to existing layers. For instance, replace lb and cb placeholders in lb@n and cb@n columns with line and column numbers from original |
- | [[create_a_normalized_bound_group_layer|Create an original text in bound groups (" | + | Note: the following steps are a guide to the kinds of work you will be doing. |
+ | |||
+ | [[create_a_normalized_bound_group_layer|Create an original text (" | ||
+ | |||
+ | Create a new or clean up an existing layer for [[create_a_normalized_bound_group_layer|original text in bound groups (" | ||
Proofread the normalized (norm) layer. | Proofread the normalized (norm) layer. | ||
Line 33: | Line 39: | ||
* You do not need to simultaneously proofread the norm_group layer; we can reconstruct norm_group using the data in norm.) | * You do not need to simultaneously proofread the norm_group layer; we can reconstruct norm_group using the data in norm.) | ||
- | [[create_a_normalized_bound_group_layer|Reconstruct the norm_group layer.]] | + | [[create_a_normalized_bound_group_layer|Reconstruct the norm_group layer]]. |
- | Proofread the part of speech (pos) and lemma (lemma) layers. | + | Proofread the part of speech (pos), lemma (lemma), and morpheme (morph) layers. |
- | + | ||
- | [[annotating_sub-word_morphemes|Annotate for sub-word morphemes | + | |
Proofread the language of origin (lang) layer. | Proofread the language of origin (lang) layer. | ||
* You may wish to use [[Google Refine]]. | * You may wish to use [[Google Refine]]. | ||
- | * Coptic SCRIPTORIUM annotates for language of origin on the **morph** level not the **word (norm)** level. | + | * Coptic SCRIPTORIUM annotates for language of origin on the **morph** level not the **word (norm)** level. |
- | [[Metadata]] | + | Add translation, |
+ | |||
+ | Add [[Metadata]]. | ||
Validate the file using the [[https:// | Validate the file using the [[https:// | ||
+ | === Process using our NLP tools individually on your local machine === | ||
[[tokenizer|Tokenizer]] | [[tokenizer|Tokenizer]] | ||
Line 64: | Line 71: | ||
[[Ensuring orig and norm layers are the same span]] | [[Ensuring orig and norm layers are the same span]] | ||
- | [[part_of_speech_tagging_using_tree-tagger|Part of speech tagging]] | + | [[part_of_speech_tagging_using_tree-tagger|Part of speech tagging |
[[language_of_origin_tagging|Language of origin tagger]] | [[language_of_origin_tagging|Language of origin tagger]] | ||
Line 71: | Line 78: | ||
Validate the file using the [[https:// | Validate the file using the [[https:// | ||
- |
basic_annotation_workflow.txt · Last modified: 2020/08/03 18:08 by admin