checklist_for_publishing_corpora
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
checklist_for_publishing_corpora [2015/09/09 08:33] – ctschroeder | checklist_for_publishing_corpora [2024/01/12 09:04] (current) – admin | ||
---|---|---|---|
Line 1: | Line 1: | ||
====Checklist for Publishing and Releasing Corpora==== | ====Checklist for Publishing and Releasing Corpora==== | ||
- | ==1. New and revised docs should be reviewed by a Senior editor.== | + | ===1. New and revised docs should be reviewed by a Senior editor.=== |
* Check questions from annotators in the document and/or pull request | * Check questions from annotators in the document and/or pull request | ||
* read through the document | * read through the document | ||
* use Google Refine to see if errors in tokenization, | * use Google Refine to see if errors in tokenization, | ||
- | * ensure sure layer names conform to standards (see [[annotation-layer-names|layer annotation documentation]]). | + | * ensure sure layer names conform to standards (see [[annotation_layer_names|layer annotation documentation]]). |
* use the validation add-in (when available) to confirm normalized annotation spans cover the same spans as the original column spans, group layer spans are the same size, etc. | * use the validation add-in (when available) to confirm normalized annotation spans cover the same spans as the original column spans, group layer spans are the same size, etc. | ||
- | ==2. Add/correct metadata on new documents== | + | ===2. Add/correct metadata on new documents=== |
- | * Confirm metadata all conforms to standards on [[annotation-layer-names|layer annotation documentation]]. | + | * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]]. |
- | * Pay close attention to names of annotators, version number, and version date for documents. | + | * Pay close attention to names of annotators, version number, and version date for documents. |
- | ==3. Check the Issues list for each corpus to be released (whether new or revised versions of documents) on GitHub.== | + | ===3. Check the Issues list for each corpus to be released (whether new or revised versions of documents) on GitHub.=== |
Each corpus may have a list of errors noticed by users or team members. | Each corpus may have a list of errors noticed by users or team members. | ||
- | ==4. Add/correct metadata on edited, previously published documents.== | + | ===4. Add/correct metadata on edited, previously published documents.=== |
- | * Confirm metadata all conforms to standards on [[annotation-layer-names|layer annotation documentation]]. | + | * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]]. |
- | * Pay close attention to names of annotators, version number, and version date for documents. Versioning: | + | * Pay close attention to names of annotators, version number, and version date for documents. Versioning: |
+ | We now use the SAME version # and date on revised documents as on corpora. Give the revised documents the same version # and date as the updated version # and date going in the corpus metadata | ||
+ | Note: an annotator may have made a minor change a while back and changed the version # and version date, even though the revised document has not yet been published. | ||
- | ==5. Add/correct the corpus metadata.== | + | ===5. Add/correct the corpus metadata.=== |
+ | **Note: the information in this section describes the workflow before we migrated to Gitdox. This section needs to be updated.** | ||
Corpus metadata appears on the first document in a corpus. | Corpus metadata appears on the first document in a corpus. | ||
- | * Confirm metadata all conforms to standards on [[annotation-layer-names|layer annotation documentation]]. | + | * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]]. |
* Pay close attention to names of annotators: | * Pay close attention to names of annotators: | ||
* Version date should be the date of re-release. | * Version date should be the date of re-release. | ||
- | * Version #: | + | * Version # corresponds to the version of the Github release: |
- | ==6. Validate the file.== | + | ===6. Validate the file.=== |
- | * Use the [[https:// | + | * Gitdox: Use the " |
+ | * Excel: | ||
- | ==7. Convert to TEI XML.== | + | ===7. Convert to TEI and PAULA and relANNIS and publish on SCRIPTORIUM ANNIS server.=== |
+ | Typically performed by AZ. | ||
+ | |||
+ | ===8. Check ANNIS visualizations to be certain there are no obvious bugs in the corpora or stylesheet.=== | ||
+ | * Edit files as necessary. | ||
+ | * If files have been published on the public server, be sure to update the version number and date number for corpora and files. | ||
+ | * Repeat steps 7 & 8 if significant problems and/or edits. | ||
+ | |||
+ | ===9. Convert to TEI XML.=== | ||
* Each document in the corpus will be converted to TEI XML using the converter program developed by Amir Zeldes. | * Each document in the corpus will be converted to TEI XML using the converter program developed by Amir Zeldes. | ||
- | * Confirm that the document validates against the EpiDoc TEI schema. | + | * Confirm that the document validates against the EpiDoc TEI schema. |
+ | * Edit if necessary if problems with validation. | ||
+ | * If files have been published on the public server, be sure to update the version number and date number for corpora and files edited post-conversion. | ||
+ | * Re-convert to TEI XML after editing, check validation, update versioning; repeat as necessary. | ||
- | ==8. Convert to relANNIS and PAULA XML formats.== | + | ===10. Convert to PAULA & relANNIS and publish on ANNIS server === |
Typically performed by AZ. | Typically performed by AZ. | ||
- | ==8. Post TEI, relANNIS and PAULA files to GitHub public repository in their respective directories== | + | ===11. Post TEI, relANNIS and PAULA files to GitHub public repository in their respective directories=== |
E.g., https:// | E.g., https:// | ||
- | ==9. Check ANNIS visualizations to be certain there are no obvious bugs in the corpus | + | ===12. Create new meta.json file for linked data applications (with PATHS) and post in corpora repository |
+ | |||
+ | ===13. Create a new release of the GitHub corpora repository, posting information about the latest changes in the release.=== | ||
+ | At https:// | ||
+ | |||
+ | ===14. Update the urn mapping file.=== | ||
+ | https:// | ||
+ | |||
+ | ===15. New ingest at data.copticscriptorium.org to account for new data.=== | ||
+ | Create new corpora, visualizations, | ||
- | ==10. Create a new release of the GitHub corpora repository, posting information about the latest changes in the release.== | ||
- | At https:// | ||
- | ==11. New ingest at data.copticscriptorium.org to account for new data.== | ||
- | Create new corpora, visualizations, |
checklist_for_publishing_corpora.1441809237.txt.gz · Last modified: 2015/09/09 08:33 by ctschroeder