User Tools

Site Tools


natural_language_processing_service_online

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
natural_language_processing_service_online [2020/08/03 17:52] adminnatural_language_processing_service_online [2020/08/06 08:33] (current) amirzeldes
Line 5: Line 5:
   * Access the [[https://corpling.uis.georgetown.edu/coptic-nlp/|Natural Language Processing Service Online]].    * Access the [[https://corpling.uis.georgetown.edu/coptic-nlp/|Natural Language Processing Service Online]]. 
   * Copy the digitized text into NLP Service text box   * Copy the digitized text into NLP Service text box
-  * Be sure "My data contains meaningful linebreaks" is selected, assuming that your text has been transcribed according to Coptic SCRIPTORIUM [[http://copticscriptorium.org/download/tools/SCRIPTORIUMDiplTranscriptionGuidelines.pdf|transcription guidelines]], and that line breaks are indicated using the "enter" or "return" key. If your text already includes </lb> tags, select "ignore linebreaks in my data." +  * Be sure "My data contains meaningful linebreaks" is selected, assuming that your text has been transcribed according to Coptic SCRIPTORIUM [[https://github.com/CopticScriptorium/tagger-part-of-speech/raw/master/scriptorium-transcription-guidelines.pdf|transcription guidelines]], and that line breaks are indicated using the "enter" or "return" key. If your text already includes </lb> tags, select "ignore linebreaks in my data." 
-The NLP can either tokenize Coptic as part of the entire NLP SGML pipeline (select "SGML pipeline" in the Service) or produce tokenization as a separate step. **As of 2020, Coptic SCRIPTORIUM annotators no longer need to tokenize and proof the output before running the rest of the pipeline; you can now skip to step 4 if you are a project annotator using the web interface for NLP.  Note: Most annotators choose to use the GitDox annotation tool instead of the public website. Visit the [[gitdox_workflow|GitDox and GitHub]] page for more information about GitDox.**+ 
 +The NLP tools can either tokenize Coptic as part of the entire NLP SGML pipeline (select "SGML pipeline" in the Service) or produce tokenization as a separate step. **As of 2020, Coptic SCRIPTORIUM annotators no longer need to tokenize and proof the output before running the rest of the pipeline; you can now skip to step 4 if you are a project annotator using the web interface for NLP.  Note: Most annotators choose to use the GitDox annotation tool instead of the public website. Visit the [[gitdox_workflow|GitDox and GitHub]] page for more information about GitDox.**
   - To proofread tokens: Select “Just piped and dashed morphemes” and run the Service.  (Pipes indicate segmentation into words; dashes indicate smaller morphs.)   - To proofread tokens: Select “Just piped and dashed morphemes” and run the Service.  (Pipes indicate segmentation into words; dashes indicate smaller morphs.)
   - Cut and paste the SGML output into a text file and proofread the automatic tokenization, editing as necessary.   - Cut and paste the SGML output into a text file and proofread the automatic tokenization, editing as necessary.
natural_language_processing_service_online.txt · Last modified: 2020/08/06 08:33 by amirzeldes