User Tools

Site Tools


checklist_for_publishing_corpora

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
checklist_for_publishing_corpora [2016/12/07 14:17] adminchecklist_for_publishing_corpora [2024/01/12 09:04] (current) admin
Line 10: Line 10:
 ===2. Add/correct metadata on new documents=== ===2. Add/correct metadata on new documents===
   * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]].    * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]]. 
-  * Pay close attention to names of annotators, version number, and version date for documents. We now use the SAME version # and date on new documents as on corpus metadata.  Give the new document the same version # and date as the updated version # and date going in the corpus metadata (see step 5 below).  (This mean a NEW document will only have a 1.0.0 version number if the corpus is also brand new.)+  * Pay close attention to names of annotators, version number, and version date for documents. We now use the SAME version # and date on new documents as on corpus metadata; this version # corresponds with our corpus release # on Github.  All documents and corpora will have the same release (whether new docs/corpora or edited and released docs/corpora).  Do not change the version #/date on any documents that have not been edited or revised
  
 ===3. Check the Issues list for each corpus to be released (whether new or revised versions of documents) on GitHub.===  ===3. Check the Issues list for each corpus to be released (whether new or revised versions of documents) on GitHub.=== 
Line 19: Line 19:
   * Pay close attention to names of annotators, version number, and version date for documents. Versioning:    * Pay close attention to names of annotators, version number, and version date for documents. Versioning: 
 We now use the SAME version # and date on revised documents as on corpora.  Give the revised documents the same version # and date as the updated version # and date going in the corpus metadata (see step 5 below). We now use the SAME version # and date on revised documents as on corpora.  Give the revised documents the same version # and date as the updated version # and date going in the corpus metadata (see step 5 below).
-Note:  an annotator may have made a minor change a while back and changed the version # and version date, even though the revised document has not yet been published.  We do not republish a corpus every time we make a minor revision to one document.  You may wish to check the document's version # against the number in ANNIS.  +Note:  an annotator may have made a minor change a while back and changed the version # and version date, even though the revised document has not yet been published.  We do not republish a corpus every time we make a minor revision to one document.  You may wish to check the document's version # in our development files against the number in ANNIS if you have any questions. A discrepancy means someone has edited the document; please make sure the version # & date are correct.  
  
 ===5. Add/correct the corpus metadata.=== ===5. Add/correct the corpus metadata.===
 +**Note: the information in this section describes the workflow before we migrated to Gitdox. This section needs to be updated.**
 Corpus metadata appears on the first document in a corpus.   Corpus metadata appears on the first document in a corpus.  
   * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]].    * Confirm metadata all conforms to standards on [[annotation_layer_names|layer annotation documentation]]. 
   * Pay close attention to names of annotators:  //the names of all annotators of all documents in a corpus should be in the corpus metadata//; if someone has edited one document, be sure that person's name appears in the corpus metadata.     * Pay close attention to names of annotators:  //the names of all annotators of all documents in a corpus should be in the corpus metadata//; if someone has edited one document, be sure that person's name appears in the corpus metadata.  
   * Version date should be the date of re-release.     * Version date should be the date of re-release.  
-  * Version #:   +1.0.0 for major change to data and/or structure (entirely new layer annotation, entirely new tokenization method applied, etc.); +0.1.0 for significant edits but still structurally compatible with previous versions; +.0.0.1 for minor edits, e.g. fixing reported errors in transcription or pos-tags).+  * Version # corresponds to the version of the Github release:   +1.0.0 for major change to data and/or structure (entirely new layer annotation, entirely new tokenization method applied, etc.); +0.1.0 for significant edits but still structurally compatible with previous versions; +.0.0.1 for minor edits, e.g. fixing reported errors in transcription or pos-tags). All corpora released should have the same version # & date.
  
 ===6. Validate the file.=== ===6. Validate the file.===
-  * Use the [[https://github.com/CopticScriptorium/XLAddIns/releases/latest|validation Excel add-in]] with the correct schema for your document to validate your Excel file.+  * Gitdox: Use the "Validate" button to validate your spreadsheet files. 
 +  * Excel:  Use the [[https://github.com/CopticScriptorium/XLAddIns/releases/latest|validation Excel add-in]] with the correct schema for your document to validate your Excel file.
  
-===7. Convert to relANNIS and publish on SCRIPTORIUM ANNIS server.===+===7. Convert to TEI and PAULA and relANNIS and publish on SCRIPTORIUM ANNIS server.===
 Typically performed by AZ. Typically performed by AZ.
  
Line 46: Line 48:
   * Re-convert to TEI XML after editing, check validation, update versioning; repeat as necessary.   * Re-convert to TEI XML after editing, check validation, update versioning; repeat as necessary.
  
-===10. Convert to relANNIS and publish on ANNIS server ===+===10. Convert to PAULA & relANNIS and publish on ANNIS server ===
 Typically performed by AZ. Typically performed by AZ.
  
-==+10. Convert to PAULA XML format.=== +===11. Post TEI, relANNIS and PAULA files to GitHub public repository in their respective directories===
-Typically performed by AZ. +
- +
-===11. Post TEI, relANNIS and PAULA files to GitHub public repository in their respective directories==++
 E.g., https://github.com/CopticScriptorium/corpora/tree/master/AP/apophthegmata.patrum_PAULA for the PAULA XML files of the Apophthegmata Patrum (AP) corpus E.g., https://github.com/CopticScriptorium/corpora/tree/master/AP/apophthegmata.patrum_PAULA for the PAULA XML files of the Apophthegmata Patrum (AP) corpus
  
-===12. Create a new release of the GitHub corpora repository, posting information about the latest changes in the release.===+===12. Create new meta.json file for linked data applications (with PATHS) and post in corpora repository 
 + 
 +===13. Create a new release of the GitHub corpora repository, posting information about the latest changes in the release.===
 At https://github.com/CopticScriptorium/corpora/releases, click "Draft New Release."  Give it a new version number. (Should be same number as the new corpus and document version #s)  Describe the corpus and changes/ additions in the description. At https://github.com/CopticScriptorium/corpora/releases, click "Draft New Release."  Give it a new version number. (Should be same number as the new corpus and document version #s)  Describe the corpus and changes/ additions in the description.
  
-===13. New ingest at data.copticscriptorium.org to account for new data.===+===14. Update the urn mapping file.=== 
 +https://github.com/CopticScriptorium/cts/blob/master/coptic/gh_ingest/name_mapping.tab 
 + 
 +===15. New ingest at data.copticscriptorium.org to account for new data.===
 Create new corpora, visualizations, etc., if necessary; see [[administrative_interface|documentation in wiki for this application]]) Create new corpora, visualizations, etc., if necessary; see [[administrative_interface|documentation in wiki for this application]])
 +
 +
checklist_for_publishing_corpora.1481145446.txt.gz · Last modified: 2016/12/07 14:17 by admin