User Tools

Site Tools


onboarding_new_annotators_editors

Onboarding

In general, onboarding new staff or volunteers who are editing/annotating in GitDox follows the following steps:

GitHub and GitDox accounts

  1. If they do not already have a GitHub account, ask them to set one up. If they are a new or existing user at GitHub, they will still need to provide us with their username, password, and email address, so instruct them to select a unique password
  2. Invite the person to the Coptic Scriptorium GitHub organization and specifically to the correct group (usually the editors/annotators group)
  3. In their GitHub account, the user needs to create an access token by going to Settings > Developer Settings > Access Tokens (usually “classic” suffices). They can name the token anything, but it should have no expiration date and should have all the boxes for “Repo” selected.
  4. Add the user to GitDox in the Admin interface. Usually using the same username and password as for GitHub works well. They will not be able to commit to GitHub through GitDox until the next step is complete, but they can at least save their work to the CS server
  5. Send the GitHub username, password, and token to Amir Zeldes to ensure they can commit to GitHub; he will update the GitDox account with the token
  6. Direct the new editor/annotator to the Quick Start guide on our Documentation Page
  7. Other guidelines also on the documentation page provide more detail regarding transcribing and tagging

Materials

Materials the editor/annotator may need in addition to GitDox and GitHub credentials and documentation listed above (depending on the type of work):

  1. OCR or transcribed text (if available)
  2. Manuscript image (if available)
  3. pdf of edition (either public domain edition or copyright edition that will not be copied/used directly but only for reference)

Suggested First Steps

First steps in the text editor (XML mode) of GitDox

  1. Transcribe or copy transcription into GitDox
  2. Make sure the letters are correct. Edit as necessary, referencing the original version (often the manuscript image but may be a public domain edition or an edition donated by a scholar)
  3. Add or change supra linear strokes as necessary
  4. Ensure line breaks match line breaks in the manuscript or edition
  5. Bound groups should follow Layton’s method of binding
  6. Add underscores between bound groups

At any point, you can hit the “validate” button to see if your work is valid under our annotation practices/rules. (“Validation” does not check for all errors – it checks only for common formatting mistakes.)

The editor can a) run the tokenizer and check tokenization in XML mode before running the full NLP tool set or b)run the NLP tool set straight off. For anything other than very clean, classical Sahidic, checking tokenization first is recommended.

First steps in spreadsheet (ether mode) of GitDox:

  1. Review the tok (token), bound group, and word layers to be sure the tokenization is correct
  2. Any changes such as adding a row or deleting a row should be made with care: be sure to check the content and spans in the other columns of the spreadsheet to make sure you are not deleting anything important or messing up the spans
  3. change the line break numbers in the lb_n column to match the line numbers in the maunuscript. (If you don't have an lb_n column don't worry – check out Annotation layer names for Coptic SCRIPTORIUM to see all the layer names
  4. Check and correct lemma, pos, normalization, language of origin, morph layers (again, watch the content and spans in the other columns)

There are many other annotations that can be done in GitDox (including English and Arabic translations). See the page for Annotation layer names for Coptic SCRIPTORIUM for all the annotations you might work on.

onboarding_new_annotators_editors.txt · Last modified: 2023/05/31 16:02 by admin