User Tools

Site Tools


checklist_for_publishing_corpora

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
checklist_for_publishing_corpora [2015/09/09 08:33] – created ctschroederchecklist_for_publishing_corpora [2024/01/12 09:04] (current) admin
Line 1: Line 1:
 ====Checklist for Publishing and Releasing Corpora==== ====Checklist for Publishing and Releasing Corpora====
  
-==1. New and revised docs should be reviewed by a Senior editor.==+===1. New and revised docs should be reviewed by a Senior editor.===
   * Check questions from annotators in the document and/or pull request   * Check questions from annotators in the document and/or pull request
   * read through the document   * read through the document
   * use Google Refine to see if errors in tokenization, pos-tagging, lang-tagging, morph annotation, and normalization pop out   * use Google Refine to see if errors in tokenization, pos-tagging, lang-tagging, morph annotation, and normalization pop out
-  * ensure sure layer names conform to standards (see [[annotation-layer-names|layer annotation documentation]]). +  * ensure sure layer names conform to standards (see [[annotation_layer_names|layer annotation documentation]]). 
   * use the validation add-in (when available) to confirm normalized annotation spans cover the same spans as the original column spans, group layer spans are the same size, etc.   * use the validation add-in (when available) to confirm normalized annotation spans cover the same spans as the original column spans, group layer spans are the same size, etc.
  
-==2. Add/correct metadata on new documents== +===2. Add/correct metadata on new documents=== 
-  * Confirm metadata all conforms to standards on [[annotation-layer-names|layer annotation documentation]].  +  * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]].  
-  * Pay close attention to names of annotators, version number, and version date for documents. Typically a newly published document will have v. 1.0.0.  If you wish to publish a document and use the visualizations to help correct and proofread, you may release as 0.1.(or a number lower than 1.0.0).+  * Pay close attention to names of annotators, version number, and version date for documents. We now use the SAME version # and date on new documents as on corpus metadata; this version # corresponds with our corpus release # on Github All documents and corpora will have the same release # (whether new docs/corpora or edited and released docs/corpora).  Do not change the version #/date on any documents that have not been edited or revised
  
-==3. Check the Issues list for each corpus to be released (whether new or revised versions of documents) on GitHub.== +===3. Check the Issues list for each corpus to be released (whether new or revised versions of documents) on GitHub.=== 
 Each corpus may have a list of errors noticed by users or team members.  (E.g., https://github.com/CopticScriptorium/ap-dev/issues/35).  Make corrections, and note on the issues list that the corrections have been made. Each corpus may have a list of errors noticed by users or team members.  (E.g., https://github.com/CopticScriptorium/ap-dev/issues/35).  Make corrections, and note on the issues list that the corrections have been made.
  
-==4. Add/correct metadata on edited, previously published documents.== +===4. Add/correct metadata on edited, previously published documents.=== 
-  * Confirm metadata all conforms to standards on [[annotation-layer-names|layer annotation documentation]].  +  * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]].  
-  * Pay close attention to names of annotators, version number, and version date for documents. Versioning: +1.0.0 for major change to data and/or structure (entirely new layer annotation, entirely new tokenization method applied, etc.); +0.1.0 for significant edits but still structurally compatible with previous versions; +.0.0.1 for minor edits, e.g. fixing reported errors in transcription or pos-tags).  Note:  an annotator may have made a minor change a while back and changed the version # and version date accordingly, even though the revised document has not yet been published.  We do not republish a corpus every time we make a minor revision to one document.  You may wish to check the document's version # against the number in ANNIS.  +  * Pay close attention to names of annotators, version number, and version date for documents. Versioning:  
 +We now use the SAME version # and date on revised documents as on corpora Give the revised documents the same version # and date as the updated version # and date going in the corpus metadata (see step 5 below). 
 +Note:  an annotator may have made a minor change a while back and changed the version # and version date, even though the revised document has not yet been published.  We do not republish a corpus every time we make a minor revision to one document.  You may wish to check the document's version # in our development files against the number in ANNIS if you have any questions. A discrepancy means someone has edited the document; please make sure the version # & date are correct.  
  
-==5. Add/correct the corpus metadata.==+===5. Add/correct the corpus metadata.==
 +**Note: the information in this section describes the workflow before we migrated to Gitdox. This section needs to be updated.**
 Corpus metadata appears on the first document in a corpus.   Corpus metadata appears on the first document in a corpus.  
-  * Confirm metadata all conforms to standards on [[annotation-layer-names|layer annotation documentation]]. +  * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]]. 
   * Pay close attention to names of annotators:  //the names of all annotators of all documents in a corpus should be in the corpus metadata//; if someone has edited one document, be sure that person's name appears in the corpus metadata.     * Pay close attention to names of annotators:  //the names of all annotators of all documents in a corpus should be in the corpus metadata//; if someone has edited one document, be sure that person's name appears in the corpus metadata.  
   * Version date should be the date of re-release.     * Version date should be the date of re-release.  
-  * Version #:   +1.0.0 for major change to data and/or structure (entirely new layer annotation, entirely new tokenization method applied, etc.); +0.1.0 for significant edits but still structurally compatible with previous versions; +.0.0.1 for minor edits, e.g. fixing reported errors in transcription or pos-tags).+  * Version # corresponds to the version of the Github release:   +1.0.0 for major change to data and/or structure (entirely new layer annotation, entirely new tokenization method applied, etc.); +0.1.0 for significant edits but still structurally compatible with previous versions; +.0.0.1 for minor edits, e.g. fixing reported errors in transcription or pos-tags). All corpora released should have the same version # & date.
  
-==6. Use the [[https://github.com/CopticScriptorium/XLAddIns/releases/latest|validation Excel add-in]] with the correct schema for your document to validate your Excel file.+===6. Validate the file.=== 
 +  * Gitdox: Use the "Validate" button to validate your spreadsheet files. 
 +  * Excel:  Use the [[https://github.com/CopticScriptorium/XLAddIns/releases/latest|validation Excel add-in]] with the correct schema for your document to validate your Excel file.
  
-==7. Convert to TEI XML.==+===7. Convert to TEI and PAULA and relANNIS and publish on SCRIPTORIUM ANNIS server.=== 
 +Typically performed by AZ. 
 + 
 +===8. Check ANNIS visualizations to be certain there are no obvious bugs in the corpora or stylesheet.=== 
 +  * Edit files as necessary.   
 +  * If files have been published on the public server, be sure to update the version number and date number for corpora and files. 
 +  * Repeat steps 7 & 8 if significant problems and/or edits. 
 + 
 +===9. Convert to TEI XML.===
   * Each document in the corpus will be converted to TEI XML using the converter program developed by Amir Zeldes.  (AZ typically does this step.)   * Each document in the corpus will be converted to TEI XML using the converter program developed by Amir Zeldes.  (AZ typically does this step.)
-  * Confirm that the document validates against the EpiDoc TEI schema.  http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng  (Edit if necessary if problems with validation.)+  * Confirm that the document validates against the EpiDoc TEI schema.  http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng   
 +  * Edit if necessary if problems with validation. 
 +  * If files have been published on the public server, be sure to update the version number and date number for corpora and files edited post-conversion. 
 +  * Re-convert to TEI XML after editing, check validation, update versioning; repeat as necessary.
  
-==7. Convert to relANNIS and PAULA XML formats.==+===10. Convert to PAULA & relANNIS and publish on ANNIS server ===
 Typically performed by AZ. Typically performed by AZ.
  
-==8. Post TEI, relANNIS and PAULA files to GitHub public repository in their respective directories==+===11. Post TEI, relANNIS and PAULA files to GitHub public repository in their respective directories===
 E.g., https://github.com/CopticScriptorium/corpora/tree/master/AP/apophthegmata.patrum_PAULA for the PAULA XML files of the Apophthegmata Patrum (AP) corpus E.g., https://github.com/CopticScriptorium/corpora/tree/master/AP/apophthegmata.patrum_PAULA for the PAULA XML files of the Apophthegmata Patrum (AP) corpus
  
-==9Check ANNIS visualizations to be certain there are no obvious bugs in the corpus or stylesheet.==+===12Create new meta.json file for linked data applications (with PATHS) and post in corpora repository 
 + 
 +===13. Create a new release of the GitHub corpora repository, posting information about the latest changes in the release.=== 
 +At https://github.com/CopticScriptorium/corpora/releases, click "Draft New Release."  Give it a new version number. (Should be same number as the new corpus and document version #s)  Describe the corpus and changes/ additions in the description. 
 + 
 +===14. Update the urn mapping file.=== 
 +https://github.com/CopticScriptorium/cts/blob/master/coptic/gh_ingest/name_mapping.tab 
 + 
 +===15. New ingest at data.copticscriptorium.org to account for new data.=== 
 +Create new corpora, visualizations, etc., if necessary; see [[administrative_interface|documentation in wiki for this application]])
  
-==10. Create a new release of the GitHub corpora repository, posting information about the latest changes in the release.== 
-At https://github.com/CopticScriptorium/corpora/releases, click "Draft New Release."  Give it a new version number (Typically + 0.0.1 for minor edits/bug fixes; +0.1.0 for new documents/corpora added or minor changes to metadata model, etc.; +1.0.0 for entirely new features, new data models, etc.).  Describe the corpus and changes/ additions in the description. 
  
-==11. New ingest at data.copticscriptorium.org to account for new data.==  
-Create new corpora, visualizations, etc., if necessary; see [[administrative-interface|documentation in wiki for this application]]) 
checklist_for_publishing_corpora.1441809187.txt.gz · Last modified: 2015/09/09 08:33 by ctschroeder