User Tools

Site Tools


corpus_metadata

This is an old revision of the document!


Metadata at the corpus level includes the following fields.

annotation names of anyone involved in annotation of any kind (transcription, annotation, editing, etc.) for all texts in the corpus, separated by commas
corpus the name of the corpus as it will appear on data.copticscriptorium.org or in ANNIS
Greek_source optional; information about source for Greek aligned text
copyright optional; copyright information, currently only used in machine-annotated Sahidica corpus
languages language(s) of texts in the corpus, usually Sahidic Coptic
license optional; license under which the corpus is published (included when not CS's usual CC-BY, such as for the Sahidica)
project name of project supporting the transcription/annotation/publication (e.g., Coptic SCRIPTORIUM, KoMET, etc.)
source optional; source of material for publication, if applicable, for instance, http://papyri.info for the doc.papyri corpus
tagger_version optional; version of tagger used on corpus, currently used only in machine-annotated corpora
tokenizer_version optional; version of tokenizer used on corpus, currently used on in machine-annotated corpora
translation name of translators for all texts in the corpus, separated by commas
version_date the most recent date of publication for any text in the corpus in yyyy-mm-dd format
version_n updated for any publication of new data for the corpus
corpus_metadata.1496969497.txt.gz · Last modified: 2017/06/08 18:51 by admin