====Checklist for Publishing and Releasing Corpora==== ===1. New and revised docs should be reviewed by a Senior editor.=== * Check questions from annotators in the document and/or pull request * read through the document * use Google Refine to see if errors in tokenization, pos-tagging, lang-tagging, morph annotation, and normalization pop out * ensure sure layer names conform to standards (see [[annotation_layer_names|layer annotation documentation]]). * use the validation add-in (when available) to confirm normalized annotation spans cover the same spans as the original column spans, group layer spans are the same size, etc. ===2. Add/correct metadata on new documents=== * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]]. * Pay close attention to names of annotators, version number, and version date for documents. We now use the SAME version # and date on new documents as on corpus metadata; this version # corresponds with our corpus release # on Github. All documents and corpora will have the same release # (whether new docs/corpora or edited and released docs/corpora). Do not change the version #/date on any documents that have not been edited or revised ===3. Check the Issues list for each corpus to be released (whether new or revised versions of documents) on GitHub.=== Each corpus may have a list of errors noticed by users or team members. (E.g., https://github.com/CopticScriptorium/ap-dev/issues/35). Make corrections, and note on the issues list that the corrections have been made. ===4. Add/correct metadata on edited, previously published documents.=== * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]]. * Pay close attention to names of annotators, version number, and version date for documents. Versioning: We now use the SAME version # and date on revised documents as on corpora. Give the revised documents the same version # and date as the updated version # and date going in the corpus metadata (see step 5 below). Note: an annotator may have made a minor change a while back and changed the version # and version date, even though the revised document has not yet been published. We do not republish a corpus every time we make a minor revision to one document. You may wish to check the document's version # in our development files against the number in ANNIS if you have any questions. A discrepancy means someone has edited the document; please make sure the version # & date are correct. ===5. Add/correct the corpus metadata.=== **Note: the information in this section describes the workflow before we migrated to Gitdox. This section needs to be updated.** Corpus metadata appears on the first document in a corpus. * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]]. * Pay close attention to names of annotators: //the names of all annotators of all documents in a corpus should be in the corpus metadata//; if someone has edited one document, be sure that person's name appears in the corpus metadata. * Version date should be the date of re-release. * Version # corresponds to the version of the Github release: +1.0.0 for major change to data and/or structure (entirely new layer annotation, entirely new tokenization method applied, etc.); +0.1.0 for significant edits but still structurally compatible with previous versions; +.0.0.1 for minor edits, e.g. fixing reported errors in transcription or pos-tags). All corpora released should have the same version # & date. ===6. Validate the file.=== * Gitdox: Use the "Validate" button to validate your spreadsheet files. * Excel: Use the [[https://github.com/CopticScriptorium/XLAddIns/releases/latest|validation Excel add-in]] with the correct schema for your document to validate your Excel file. ===7. Convert to TEI and PAULA and relANNIS and publish on SCRIPTORIUM ANNIS server.=== Typically performed by AZ. ===8. Check ANNIS visualizations to be certain there are no obvious bugs in the corpora or stylesheet.=== * Edit files as necessary. * If files have been published on the public server, be sure to update the version number and date number for corpora and files. * Repeat steps 7 & 8 if significant problems and/or edits. ===9. Convert to TEI XML.=== * Each document in the corpus will be converted to TEI XML using the converter program developed by Amir Zeldes. (AZ typically does this step.) * Confirm that the document validates against the EpiDoc TEI schema. http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng * Edit if necessary if problems with validation. * If files have been published on the public server, be sure to update the version number and date number for corpora and files edited post-conversion. * Re-convert to TEI XML after editing, check validation, update versioning; repeat as necessary. ===10. Convert to PAULA & relANNIS and publish on ANNIS server === Typically performed by AZ. ===11. Post TEI, relANNIS and PAULA files to GitHub public repository in their respective directories=== E.g., https://github.com/CopticScriptorium/corpora/tree/master/AP/apophthegmata.patrum_PAULA for the PAULA XML files of the Apophthegmata Patrum (AP) corpus ===12. Create new meta.json file for linked data applications (with PATHS) and post in corpora repository ===13. Create a new release of the GitHub corpora repository, posting information about the latest changes in the release.=== At https://github.com/CopticScriptorium/corpora/releases, click "Draft New Release." Give it a new version number. (Should be same number as the new corpus and document version #s) Describe the corpus and changes/ additions in the description. ===14. Update the urn mapping file.=== https://github.com/CopticScriptorium/cts/blob/master/coptic/gh_ingest/name_mapping.tab ===15. New ingest at data.copticscriptorium.org to account for new data.=== Create new corpora, visualizations, etc., if necessary; see [[administrative_interface|documentation in wiki for this application]])