User Tools

Site Tools


basic_annotation_workflow

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
basic_annotation_workflow [2020/08/03 16:56] adminbasic_annotation_workflow [2020/08/03 18:08] (current) admin
Line 1: Line 1:
-==== Basic Annotation Workflow ====+ ===== Basic Annotation Workflow =====
  
-=== Transcribe your text ===+ ==== Transcribe your text ====
  
 Transcribe your text in [[gitdox_workflow|GitDox]]. Alternatively, transcribe your text into a [[transcribe_a_text|text file]]. Be sure the transcription divides the text into bound groups Transcribe your text in [[gitdox_workflow|GitDox]]. Alternatively, transcribe your text into a [[transcribe_a_text|text file]]. Be sure the transcription divides the text into bound groups
  
-At this point, you may follow one of two paths: +At this point, you will want to use the Natural Language Processing (NLP) Service online Veteran users will remember processing texts with individual NLP tools on local machine and then editing the annotations in an Excel spreadsheet. The local process is outlined at the end of this page, but it is no longer recommended.
-  - Use the Natural Language Processing (NLP) Service online +
-  - Process using our NLP tools individually on your local machine+
  
-These two paths are outlined in detail in the following sections. + ==== Running the NLP Service in GitDox ====
- +
-=== NLP Service Online Workflow ===+
  
 [[natural_language_processing_service_online|Run the NLP Service]] on your transcribed text in GitDox.   [[natural_language_processing_service_online|Run the NLP Service]] on your transcribed text in GitDox.  
Line 20: Line 16:
  
 //Note for veteran GitDox users: you do not need to proofread tokenization as part of the NLP Service process. The NLP service works better, now, without tokenizing first.// //Note for veteran GitDox users: you do not need to proofread tokenization as part of the NLP Service process. The NLP service works better, now, without tokenizing first.//
 +
 +Clicking this button will annotate your data and change the format from running text to a spreadsheet, in which each item of textual data (known as a "token") is a row in the spreadsheet, with each annotation as a column. 
 +
 + ==== Editing the annotations in the GitDox spreadsheet ====
 +
 +You can then edit the annotations manually. For example:
 +  * You can add "Greek" to the "lang" (=language of origin) column if the tools failed to recognize and annotate a Greek loan word.  
 +  * You can correct the lemma, part-of-speech tag or normalization
 +  * You can add an editorial note in a note_note column
 +
 +Tips:
 +  * You may need to add or subtract rows if the tokenization needs correcting. When doing so **be careful of all the annotation layers, especially the spans for pages, columns, and lines. Ensure the spans all begin and end where they are supposed to.**
 +  * At the top of the spreadsheet in the upper right, you can click on a little bar in the upper right above the scrolling column and pull it down one row to freeze the column of labels in place as you scroll.
 +  * You do not need to manually edit the columns for parsing. You do not need to worry about preserving the spans in these columns. Parsing will be done at a later step (whether manually or automatically).
 +  * Use the [[https://coptic-dictionary.org|online dictionary]] to look up word forms and their lemmas.
 +  * Use ANNIS to search for word forms and how we usually tag/lemmatize them
 +  * Prefacing a layer label (column label) with "ignore:" will ensure that that column is not imported into our public published data (ANNIS, GitHub).
 +
 +Add verse and chapter layers following our [[versification|guidelines on chapter divisions and versification]].
 +
 +Add a translation layer (even if you are not providing a translation at this time). Translation layers are usually the same length as the verse layer. 
 +  * To add a translation layer, position your cursor in the first row in the column to the right of the verse layer. Click on the button above the spreadsheet to add a column (it should show a green column). The spreadsheet will then add a column with spans matching the column on the left.
 +  * If you are not providing an English translation right now, fill all translation spans with "..." (Three periods, not the ellipsis character.)
 +
 +Check all the column names to be sure they conform to [[annotation_layer_names|our guidelines for annotation layers]].
 +  * In most cases, verse and translation spans should be identical in length, because typically verses are a sentence long. See [[versification|our guidelines on versification]] for more information.
 +  * You can delete unnecessary layers. If you are unsure about whether a layer/column should be deleted, relabel it with an "ignore:" prefix and ask a senior editor
 +
 +Click the "validation" button to validate the data (= ensure it has valid structure to be published).  If you are having trouble fixing validation errors, please contact a senior editor.
 +
 +While data is saved automatically and frequently in the spreadsheet mode, we strongly recommend annotators commit their changes **often** using the commit log under the spreadsheet.  Please use a commit message you will understand when you return to your work days or weeks later. (For example, "checked rows 101-200" is more detailed and more understandable than "continued checking annotations".)
 +
 +
 +==== Using our NLP tools individually on your local machine ====
 +
 +You will need to download the tools from our GitHub site. We no longer recommend using this process, as the most up to date tools are on our NLP online tool suite, available in three ways:
 +  - [[https://corpling.uis.georgetown.edu/coptic-nlp/|on our website]] (cut and paste or type in Coptic text)
 +  - an API (see [[https://corpling.uis.georgetown.edu/coptic-nlp/]] for contact information)
 +  - using our annotation environment GitDox (see above)
 +
 +If you are using our standalone tools, here are the steps:
  
 [[import_macro|Import the SGML into a spreadsheet.]] [[import_macro|Import the SGML into a spreadsheet.]]
basic_annotation_workflow.1596495381.txt.gz · Last modified: 2020/08/03 16:56 by admin