Differences

This shows you the differences between two versions of the page.

--- natural_language_processing_service_online [2020/08/03 17:52] – admin
+++ natural_language_processing_service_online [2020/08/06 08:33] (current) – amirzeldes
@@ Line 5: / Line 5: @@
   * Access the [[https://corpling.uis.georgetown.edu/coptic-nlp/|Natural Language Processing Service Online]].
   * Copy the digitized text into NLP Service text box
-  * Be sure "My data contains meaningful linebreaks" is selected, assuming that your text has been transcribed according to Coptic SCRIPTORIUM [[http://copticscriptorium.org/download/tools/SCRIPTORIUMDiplTranscriptionGuidelines.pdf|transcription guidelines]], and that line breaks are indicated using the "enter" or "return" key. If your text already includes </lb> tags, select "ignore linebreaks in my data."
+  * Be sure "My data contains meaningful linebreaks" is selected, assuming that your text has been transcribed according to Coptic SCRIPTORIUM [[https://github.com/CopticScriptorium/tagger-part-of-speech/raw/master/scriptorium-transcription-guidelines.pdf|transcription guidelines]], and that line breaks are indicated using the "enter" or "return" key. If your text already includes </lb> tags, select "ignore linebreaks in my data."
-The NLP can either tokenize Coptic as part of the entire NLP SGML pipeline (select "SGML pipeline" in the Service) or produce tokenization as a separate step. **As of 2020, Coptic SCRIPTORIUM annotators no longer need to tokenize and proof the output before running the rest of the pipeline; you can now skip to step 4 if you are a project annotator using the web interface for NLP.  Note: Most annotators choose to use the GitDox annotation tool instead of the public website. Visit the [[gitdox_workflow|GitDox and GitHub]] page for more information about GitDox.**
+The NLP tools can either tokenize Coptic as part of the entire NLP SGML pipeline (select "SGML pipeline" in the Service) or produce tokenization as a separate step. **As of 2020, Coptic SCRIPTORIUM annotators no longer need to tokenize and proof the output before running the rest of the pipeline; you can now skip to step 4 if you are a project annotator using the web interface for NLP.  Note: Most annotators choose to use the GitDox annotation tool instead of the public website. Visit the [[gitdox_workflow|GitDox and GitHub]] page for more information about GitDox.**
   - To proofread tokens: Select “Just piped and dashed morphemes” and run the Service.  (Pipes indicate segmentation into words; dashes indicate smaller morphs.)
   - Cut and paste the SGML output into a text file and proofread the automatic tokenization, editing as necessary.