corpus_metadata
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
corpus_metadata [2017/06/08 15:44] – created eplatte | corpus_metadata [2017/06/08 21:55] (current) – Updates based on CTS and AZ feedback eplatte | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | Metadata at the corpus level includes the following fields. | + | Metadata at the corpus level includes the following fields. See our [[annotation_layer_names| annotation layer name documentation]] for document-level metadata. |
- | |annotation| names of anyone involved in annotation of any kind (transcription, | + | |annotation| names of anyone involved in annotation of any kind (transcription, |
|corpus| the name of the corpus as it will appear on data.copticscriptorium.org or in ANNIS| | |corpus| the name of the corpus as it will appear on data.copticscriptorium.org or in ANNIS| | ||
- | |Greek_source| information about source for Greek aligned | + | |Greek_source| |
- | |copyright| copyright information, | + | |copyright| |
|languages| language(s) of texts in the corpus, usually Sahidic Coptic| | |languages| language(s) of texts in the corpus, usually Sahidic Coptic| | ||
- | |license| optional: license under which the corpus is published (included when not CS's usual CC-BY, such as for the Sahidica)| | + | |license| optional; license under which the corpus is published (included when not Coptic SCRIPTORIUM's usual CC-BY, such as for the Sahidica)| |
|project| name of project supporting the transcription/ | |project| name of project supporting the transcription/ | ||
- | |source| optional: source of material for publication, | + | |source| optional; source of material for publication, |
- | |tagger_version| version of tagger used on corpus, currently used only in machine-annotated corpora| | + | |tagger_version| |
- | |tokenizer_version| version of tokenizer used on corpus, currently used on in machine-annotated corpora| | + | |tokenizer_version| |
|translation| name of translators for all texts in the corpus, separated by commas| | |translation| name of translators for all texts in the corpus, separated by commas| | ||
|version_date| the most recent date of publication for any text in the corpus in yyyy-mm-dd format| | |version_date| the most recent date of publication for any text in the corpus in yyyy-mm-dd format| | ||
|version_n| updated for any publication of new data for the corpus| | |version_n| updated for any publication of new data for the corpus| |
corpus_metadata.1496958297.txt.gz · Last modified: 2017/06/08 15:44 by eplatte