annotation_layer_names
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
annotation_layer_names [2019/05/02 11:35] – admin | annotation_layer_names [2022/01/12 09:47] (current) – admin | ||
---|---|---|---|
Line 2: | Line 2: | ||
=== Data === | === Data === | ||
+ | |||
//This document supercedes the Google doc used previously.// | //This document supercedes the Google doc used previously.// | ||
(Note: in the layer names, @ and _ are interchangeable; | (Note: in the layer names, @ and _ are interchangeable; | ||
- | ^ tok | tokens, smallest possible unit to be annotated; MAY BE SMALLER THAN THE MORPHEMES IN ORIG | | + | |tok | tokens, smallest possible unit to be annotated; MAY BE SMALLER THAN THE MORPHEMES IN ORIG | |
- | ^ orig | smallest unit of LANGUAGE (morpheme or word level; smaller than the bound group level); orthography is from the original text (diplomatic, | + | |orig | smallest unit of LANGUAGE (morpheme or word level; smaller than the bound group level); orthography is from the original text (diplomatic, |
- | ^ orig_group | bound groups using the original orthography, | + | |orig_group | bound groups using the original orthography, |
- | ^ norm_group | bound groups (same structure as orig_word but with normalized spelling, etc., so content is based on norm). Spans in this layer must match those in orig_group exactly in their length. | | + | |norm_group | bound groups (same structure as orig_word but with normalized spelling, etc., so content is based on norm). Spans in this layer must match those in orig_group exactly in their length. | |
- | ^ norm | normalized version of orig. Spans in this layer must match those in orig exactly in their length. | | + | |norm | normalized version of orig. Spans in this layer must match those in orig exactly in their length. | |
- | ^ pos | part of speech tags. Spans in this layer must match those in norm exactly in their length. (i.e. norm units are the units that carry parts of speech. | | + | |pos | part of speech tags. Spans in this layer must match those in norm exactly in their length. (i.e. norm units are the units that carry parts of speech. | |
- | ^ lang | language of origin tags (Hebrew, Greek, Latin, Aramaic, etc.) | | + | |lang | language of origin tags (Hebrew, Greek, Latin, Aramaic, etc.) | |
- | ^ morph | morphs that are below the word level -- this is where words containing mnt, at, ref are annotated a second time (see http:// | + | |morph | morphs that are below the word level -- this is where words containing mnt, at, ref are annotated a second time (see http:// |
- | ^ lemma | lemma (dictionary head word); annotates on the normalized words (" | + | |lemma | lemma (dictionary head word); annotates on the normalized words (" |
- | ^ note | notes that normally would go in a TEI XML <note note=" | + | |note | notes that normally would go in a TEI XML <note note=" |
- | ^ hi@rend | text renderings (see http:// | + | |hi@rend | usually appears as hi_rend in the column name in spreadsheet mode; for text renderings (see http:// |
- | ^ gap | Annotates for lacunae. Corresponds to the EpiDoc TEI-XML element gap. Uses attributes such as @reason, @unit, @quantity, and @extent. With attributes, each element+attribute annotation generates a new layer in the multi-layer data model | | + | |gap | Annotates for lacunae. Corresponds to the EpiDoc TEI-XML element gap. Uses attributes such as @reason, @unit, @quantity, and @extent. With attributes, each element+attribute annotation generates a new layer in the multi-layer data model | |
- | ^ supplied | Annotates for supplied text where text is missing from the original for a variety of reasons. Corresponds to the EpiDoc TEI-XML element supplied. Uses attributes such as @evidence and @reason. With attributes, each element+attribute annotation generates a new layer in the multi-layer data model. | | + | |supplied | Annotates for supplied text where text is missing from the original for a variety of reasons. Corresponds to the EpiDoc TEI-XML element supplied. Uses attributes such as @evidence and @reason. With attributes, each element+attribute annotation generates a new layer in the multi-layer data model. | |
- | ^ lb@n | line breaks -- numbered according to the original manuscript | | + | |lb@n | usually appears as lb_n in column header in spreadsheet mode; line breaks -- numbered according to the original manuscript | |
- | ^ cb@n | column breaks -- numbered according to the original manuscript | | + | |cb@n | usually appears as cb_n in column header in spreadsheet mode; column breaks -- numbered according to the original manuscript | |
- | ^ pb_xml_id | Page numbers of original manuscript (not the current repository numbering); be sure column label does not include a colon (e.g. pb_xml_id not pb_xml:id); be sure page numbers do not include spaces (e.g. EG202 not EG 202) (TEI XML <pb xml: | + | |pb_xml_id | Page numbers of original manuscript (not the current repository numbering); be sure column label does not include a colon (e.g. pb_xml_id not pb_xml:id); be sure page numbers do not include spaces (e.g. EG202 not EG 202) (TEI XML <pb xml: |
- | ^ ignore: | + | |ignore:note | notes that will NOT be imported into ANNIS or exported as TEI or PAULA XML; private notations from annotators/ |
- | ^ translation | English translation; | + | |translation | English translation; |
- | ^ p | paragraph breaks for translation and normalization; | + | |p | paragraph breaks for translation and normalization; |
- | ^ verse | verse of text written as number (always use in Bible of any kind, including Sahidica) | | + | |verse_n |
- | ^ chapter | chapter of text recorded as number; currently used only in corpora in which there are canonical or disciplinary-standard chapter divisions (not a required annotation; for Bible this information is typically in the metadata, as well) | | + | |chapter_n |
- | ^ sbl_greek | The Greek New Testament text. Annotation for the New Testament corpora only. Aligned by verse with the Coptic. Source is the [[http:// | + | |sbl_greek | The Greek New Testament text. Annotation for the New Testament corpora only. Aligned by verse with the Coptic. Source is the [[http:// |
- | ^ sbl_apparatus | The apparatus for the Greek New Testament text. Annotation for the New Testament corpora only. Aligned by verse with the Coptic and Greek New Testament text. Source is the [[http:// | + | |sbl_apparatus | The apparatus for the Greek New Testament text. Annotation for the New Testament corpora only. Aligned by verse with the Coptic and Greek New Testament text. Source is the [[http:// |
- | ^ div@type | For EpiDoc TEI compatibility, | + | |div@type | For EpiDoc TEI compatibility, |
- | ^ vid | formerly verse@id (Sahidica) | | + | |vid | formerly verse@id (Sahidica) | |
- | ^ chapter@cname | chapter of text written as text and number (not necessary -- in other data) | | + | |chapter@cname | chapter of text written as text and number (not necessary -- in other data) | |
- | ^ chapter@cid | chapter id (Sahidica-- not necessary) | | + | |chapter@cid | chapter id (Sahidica-- not necessary) | |
- | ^ verse@vname | verse of text written as text and number (e.g. 1 Corinthians 1:10) (not necessary -- in other data) | | + | |verse@vname | verse of text written as text and number (e.g. 1 Corinthians 1:10) (not necessary -- in other data) | |
- | ^ add_place | | + | |add_place | |
- | ^ vid_n | CTS URN for the verse (e.g., urn: | + | |vid_n | CTS URN for the verse (e.g., urn: |
+ | |ed_page_n | page number of a text as it appears in an edition | | ||
+ | |ed_line_n | line number of a text as it appears in an edition | | ||
+ | |entity | one of the ten entity types (e.g. person, place) see [[https:// | ||
+ | |identity | this annotation stores linked entry identifiers for named entities; it is populated automatically during export by GitDox if named entities have been added using the entity annotation interface. Annotators do not need to manually add this column| | ||
+ | |arabic | Arabic translation. Spans should follow translation and verse layers | ||
+ | === NOT columns in the spreadsheet === | ||
+ | |||
+ | The following information should **NOT** be annotated manually in the spreadsheet, | ||
+ | |||
+ | * identity - this is the ANNIS annotation corresponding to named entity linking (Wikification). This information comes from the entity identification annotations in entities mode (after clicking "List named entities" | ||
+ | * func / head - this is syntactic information from automatic or gold parsing. It is never done in spreadsheet mode, but added during publication by an automatic parser, or annotated manually in the Arborator interface (but NOT in GitDox) | ||
+ | * multiword - multiword expression annotation is also added automatically during publication based on the currect state of multiword entries in the Coptic Dictionary Online. It is not edited manually and should not be included in the spreadsheet. | ||
=== Metadata === | === Metadata === | ||
+ | |||
(Note: see also the [[corpus_metadata|corpus-level metadata]] documentation for adding metadata for the entire corpus.) | (Note: see also the [[corpus_metadata|corpus-level metadata]] documentation for adding metadata for the entire corpus.) | ||
- | [[https:// | + | [[https:// |
+ | |||
+ | |annotation | names of annotators (transcribers, | ||
+ | |arabic_translation | names of people who translated the text into Arabic in comma-delimited sequence | | ||
+ | |attributed_author | optional. attributed author of a conceptual work who may or may not be the historical author | | ||
+ | |author | author of the conceptual work | | ||
+ | |collection |collection or department in the current repository| | ||
+ | |Coptic_edition |if the text has been published before, include publication information here | | ||
+ | |copyist | optional. copyist or scribe of the text on the manuscript or text-bearing object | | ||
+ | |corpus | ||
+ | |country | ||
+ | |document_cts_urn | urn that applies to the document following data model created by Bridget Almas | | ||
+ | |endnote | ||
+ | |entities| describes whether entity annotation has been reviewed. Available values are automatic, checked, or gold; required| | ||
+ | |Greek_source | optional, information about the Greek version of the text if it exists (e.g., Greek Alphabetical or Systematic Apophthegmata Patrum) | | ||
+ | |identities| describes whether named entity linking has been reviewed. Available values are automatic, checked, or gold; required| | ||
+ | |idno |catalogue # of the manuscript in the current repository| | ||
+ | |language | language in which the text is written | | ||
+ | |license |use for copyright in Sahidica, CC-BY for everything else. If using CC-BY, enter <a href=' | ||
+ | |msContents_title@n |volume number of the thing in msContents_title@type; | ||
+ | |msContents_title@type |used for things like Shenoute' | ||
+ | |msItem_title | ||
+ | |msName | use CMCL code (e.g., MONB.YA); optional but must use msName, pages_from, pages_to all three or none at all| | ||
+ | |next | contains the CTS urn for the next document in the corpus; optional| | ||
+ | |note | optional | | ||
+ | |objectType | ||
+ | |order | ||
+ | |origDate | ||
+ | |origDate_notAfter| date of the terminum ante quem (in four digits with leading zeros, e.g., 1200); be sure the format of the cell in Excel is text not number or date; optional| | ||
+ | |origDate_notBefore| date of the terminum post quem (in four digits with leading zeros, e.g., 0900); be sure the format of the cell in Excel is text not number or date; optional| | ||
+ | |origDate_precision|likelihood that the dating is accurate -- usually " | ||
+ | |origPlace | ||
+ | |pages_from | beginning of page sequence of document (original page number of scribe but written in arabic numerals) optional but must use msName, pages_from, pages_to all three or none at all| | ||
+ | |pages_to |optional but must use msName, pages_from, pages_to all three or none at all| | ||
+ | |parsing| describes whether parsing has been reviewed. Available values are automatic, checked, or gold; required| | ||
+ | |paths_authors | ||
+ | |paths_manuscripts | PATHs project Coptic Literary Manuscript stable id entered as a link, e.g. <a href=' | ||
+ | |paths_works | ||
+ | |placeName | ||
+ | |previous | ||
+ | |project | name of project supporting the transcription/ | ||
+ | |redundant | ||
+ | |repository |current museum/ | ||
+ | |segmentation| describes whether segmentation and tokenization has been reviewed. Available values are automatic, checked, or gold; required| | ||
+ | |source | ||
+ | |source_info | | | ||
+ | |tagging| describes whether tagging has been reviewed. Available values are automatic, checked, or gold; required| | ||
+ | |title | ||
+ | |translation | use " | ||
+ | |Trismegistos |[enter the trismegistos # if it exists/is known for the manuscript]; | ||
+ | |version@date |version date of this Coptic SCRIPTORIUM data in YYYY-MM-DD format; format cell as text not as date format in excel| | ||
+ | |version@n |version of this Coptic SCRIPTORIUM data| | ||
+ | |witness | ||
+ | |||
+ | === Automatic metadata === | ||
+ | |||
+ | GitDox will automatically generate semi-colon separated lists of named entities in the following metadata fields during export. They will not show up in the GitDox table, and you should not add or edit these manually: | ||
+ | |||
+ | |people | named people identifiers for people mentioned in the document (separated by "; ") | | ||
+ | |places | named place identifiers for places mentioned in the document (separated by "; ") | | ||
- | ^ annotation | names of annotators (transcribers, | ||
- | ^ author | author of the conceptual work | | ||
- | ^ collection |collection or department in the current repository| | ||
- | ^ Coptic_edition |if the text has been published before, include publication information here | | ||
- | ^ corpus | ||
- | ^ country | ||
- | ^ document_cts_urn | urn that applies to the document following data model created by Bridget Almas | | ||
- | ^ endnote | ||
- | ^ Greek_source | optional, information about the Greek version of the text if it exists (e.g., Greek Alphabetical or Systematic Apophthegmata Patrum) | | ||
- | ^ idno |catalogue # of the manuscript in the current repository| | ||
- | ^ language | language in which the text is written | | ||
- | ^ license |use for copyright in Sahidica, CC-BY for everything else. If using CC-BY, enter <a href=' | ||
- | ^ msContents_title@n |volume number of the thing in msContents_title@type; | ||
- | ^ msContents_title@type |used for things like Shenoute' | ||
- | ^ msItem_title | ||
- | ^ msName | use CMCL code (e.g., MONB.YA); optional but must use msName, pages_from, pages_to all three or none at all| | ||
- | ^ next | contains the CTS urn for the next document in the corpus; optional| | ||
- | ^ note | optional | | ||
- | ^ objectType | ||
- | ^ order | contains a number that orders the documents in a list, with the preferred first document numbered 01. For this project, the list should begin by corresponding to the same order dictated by the next and previous metadata fields. | ||
- | ^ origDate | ||
- | ^ origDate_notAfter| date of the terminum ante quem (in four digits with leading zeros, e.g., 1200); be sure the format of the cell in Excel is text not number or date; optional| | ||
- | ^ origDate_notBefore| date of the terminum post quem (in four digits with leading zeros, e.g., 0900); be sure the format of the cell in Excel is text not number or date; optional| | ||
- | ^ origDate_precision|likelihood that the dating is accurate -- usually " | ||
- | ^ origPlace | ||
- | ^ pages_from | beginning of page sequence of document (original page number of scribe but written in arabic numerals) optional but must use msName, pages_from, pages_to all three or none at all| | ||
- | ^ pages_to |optional but must use msName, pages_from, pages_to all three or none at all| | ||
- | ^ parsing| describes whether parsing has been reviewed. Available values are automatic, checked, or gold; required| | ||
- | ^ placeName | ||
- | ^ previous | ||
- | ^ project | name of project supporting the transcription/ | ||
- | ^ redundant | ||
- | ^ repository |current museum/ | ||
- | ^ segmentation| describes whether segmentation and tokenization has been reviewed. Available values are automatic, checked, or gold; required| | ||
- | ^ source | ||
- | ^ source_info | | ||
- | ^ tagging| describes whether tagging has been reviewed. Available values are automatic, checked, or gold; required| | ||
- | ^ title | ||
- | ^ translation | use " | ||
- | ^ Trismegistos |[enter the trismegistos # if it exists/is known for the manuscript]; | ||
- | ^ version@date |version date of this Coptic SCRIPTORIUM data in YYYY-MM-DD format; format cell as text not as date format in excel| | ||
- | ^ version@n |version of this Coptic SCRIPTORIUM data| | ||
- | ^ witness |
annotation_layer_names.1556818519.txt.gz · Last modified: 2019/05/02 11:35 by admin