Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |
natural_language_processing_service_online [2020/08/03 17:52] – admin | natural_language_processing_service_online [2020/08/06 08:33] (current) – amirzeldes |
---|
* Access the [[https://corpling.uis.georgetown.edu/coptic-nlp/|Natural Language Processing Service Online]]. | * Access the [[https://corpling.uis.georgetown.edu/coptic-nlp/|Natural Language Processing Service Online]]. |
* Copy the digitized text into NLP Service text box | * Copy the digitized text into NLP Service text box |
* Be sure "My data contains meaningful linebreaks" is selected, assuming that your text has been transcribed according to Coptic SCRIPTORIUM [[http://copticscriptorium.org/download/tools/SCRIPTORIUMDiplTranscriptionGuidelines.pdf|transcription guidelines]], and that line breaks are indicated using the "enter" or "return" key. If your text already includes </lb> tags, select "ignore linebreaks in my data." | * Be sure "My data contains meaningful linebreaks" is selected, assuming that your text has been transcribed according to Coptic SCRIPTORIUM [[https://github.com/CopticScriptorium/tagger-part-of-speech/raw/master/scriptorium-transcription-guidelines.pdf|transcription guidelines]], and that line breaks are indicated using the "enter" or "return" key. If your text already includes </lb> tags, select "ignore linebreaks in my data." |
The NLP can either tokenize Coptic as part of the entire NLP SGML pipeline (select "SGML pipeline" in the Service) or produce tokenization as a separate step. **As of 2020, Coptic SCRIPTORIUM annotators no longer need to tokenize and proof the output before running the rest of the pipeline; you can now skip to step 4 if you are a project annotator using the web interface for NLP. Note: Most annotators choose to use the GitDox annotation tool instead of the public website. Visit the [[gitdox_workflow|GitDox and GitHub]] page for more information about GitDox.** | |
| The NLP tools can either tokenize Coptic as part of the entire NLP SGML pipeline (select "SGML pipeline" in the Service) or produce tokenization as a separate step. **As of 2020, Coptic SCRIPTORIUM annotators no longer need to tokenize and proof the output before running the rest of the pipeline; you can now skip to step 4 if you are a project annotator using the web interface for NLP. Note: Most annotators choose to use the GitDox annotation tool instead of the public website. Visit the [[gitdox_workflow|GitDox and GitHub]] page for more information about GitDox.** |
- To proofread tokens: Select “Just piped and dashed morphemes” and run the Service. (Pipes indicate segmentation into words; dashes indicate smaller morphs.) | - To proofread tokens: Select “Just piped and dashed morphemes” and run the Service. (Pipes indicate segmentation into words; dashes indicate smaller morphs.) |
- Cut and paste the SGML output into a text file and proofread the automatic tokenization, editing as necessary. | - Cut and paste the SGML output into a text file and proofread the automatic tokenization, editing as necessary. |