annotation_layer_names
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
annotation_layer_names [2018/10/18 17:55] – admin | annotation_layer_names [2022/01/12 09:47] (current) – admin | ||
---|---|---|---|
Line 2: | Line 2: | ||
=== Data === | === Data === | ||
+ | |||
//This document supercedes the Google doc used previously.// | //This document supercedes the Google doc used previously.// | ||
(Note: in the layer names, @ and _ are interchangeable; | (Note: in the layer names, @ and _ are interchangeable; | ||
- | ^ tok | tokens, smallest possible unit to be annotated; MAY BE SMALLER THAN THE MORPHEMES IN ORIG | | + | |tok | tokens, smallest possible unit to be annotated; MAY BE SMALLER THAN THE MORPHEMES IN ORIG | |
- | ^ orig | smallest unit of LANGUAGE (morpheme or word level; smaller than the bound group level); orthography is from the original text (diplomatic, | + | |orig | smallest unit of LANGUAGE (morpheme or word level; smaller than the bound group level); orthography is from the original text (diplomatic, |
- | ^ orig_group | bound groups using the original orthography, | + | |orig_group | bound groups using the original orthography, |
- | ^ norm_group | bound groups (same structure as orig_word but with normalized spelling, etc., so content is based on norm). Spans in this layer must match those in orig_group exactly in their length. | | + | |norm_group | bound groups (same structure as orig_word but with normalized spelling, etc., so content is based on norm). Spans in this layer must match those in orig_group exactly in their length. | |
- | ^ norm | normalized version of orig. Spans in this layer must match those in orig exactly in their length. | | + | |norm | normalized version of orig. Spans in this layer must match those in orig exactly in their length. | |
- | ^ pos | part of speech tags. Spans in this layer must match those in norm exactly in their length. (i.e. norm units are the units that carry parts of speech. | | + | |pos | part of speech tags. Spans in this layer must match those in norm exactly in their length. (i.e. norm units are the units that carry parts of speech. | |
- | ^ lang | language of origin tags (Hebrew, Greek, Latin, Aramaic, etc.) | | + | |lang | language of origin tags (Hebrew, Greek, Latin, Aramaic, etc.) | |
- | ^ morph | morphs that are below the word level -- this is where words containing mnt, at, ref are annotated a second time (see http:// | + | |morph | morphs that are below the word level -- this is where words containing mnt, at, ref are annotated a second time (see http:// |
- | ^ lemma | lemma (dictionary head word); annotates on the normalized words (" | + | |lemma | lemma (dictionary head word); annotates on the normalized words (" |
- | ^ note | notes that normally would go in a TEI XML <note note=" | + | |note | notes that normally would go in a TEI XML <note note=" |
- | ^ hi@rend | text renderings (see http:// | + | |hi@rend | usually appears as hi_rend in the column name in spreadsheet mode; for text renderings (see http:// |
- | ^ gap | Annotates for lacunae. Corresponds to the EpiDoc TEI-XML element gap. Uses attributes such as @reason, @unit, @quantity, and @extent. With attributes, each element+attribute annotation generates a new layer in the multi-layer data model | | + | |gap | Annotates for lacunae. Corresponds to the EpiDoc TEI-XML element gap. Uses attributes such as @reason, @unit, @quantity, and @extent. With attributes, each element+attribute annotation generates a new layer in the multi-layer data model | |
- | ^ supplied | Annotates for supplied text where text is missing from the original for a variety of reasons. Corresponds to the EpiDoc TEI-XML element supplied. Uses attributes such as @evidence and @reason. With attributes, each element+attribute annotation generates a new layer in the multi-layer data model. | | + | |supplied | Annotates for supplied text where text is missing from the original for a variety of reasons. Corresponds to the EpiDoc TEI-XML element supplied. Uses attributes such as @evidence and @reason. With attributes, each element+attribute annotation generates a new layer in the multi-layer data model. | |
- | ^ lb@n | line breaks -- numbered according to the original manuscript | | + | |lb@n | usually appears as lb_n in column header in spreadsheet mode; line breaks -- numbered according to the original manuscript | |
- | ^ cb@n | column breaks -- numbered according to the original manuscript | | + | |cb@n | usually appears as cb_n in column header in spreadsheet mode; column breaks -- numbered according to the original manuscript | |
- | ^ pb_xml_id | Page numbers of original manuscript (not the current repository numbering); be sure column label does not include a colon (e.g. pb_xml_id not pb_xml:id); be sure page numbers do not include spaces (e.g. EG202 not EG 202) (TEI XML <pb xml: | + | |pb_xml_id | Page numbers of original manuscript (not the current repository numbering); be sure column label does not include a colon (e.g. pb_xml_id not pb_xml:id); be sure page numbers do not include spaces (e.g. EG202 not EG 202) (TEI XML <pb xml: |
- | ^ ignore: | + | |ignore:note | notes that will NOT be imported into ANNIS or exported as TEI or PAULA XML; private notations from annotators/ |
- | ^ translation | English translation; | + | |translation | English translation; |
- | ^ p | paragraph breaks for translation and normalization; | + | |p | paragraph breaks for translation and normalization; |
- | ^ verse | verse of text written as number (always use in Bible of any kind, including Sahidica) | | + | |verse_n |
- | ^ chapter | chapter of text recorded as number; currently used only in corpora in which there are canonical or disciplinary-standard chapter divisions (not a required annotation; for Bible this information is typically in the metadata, as well) | | + | |chapter_n |
- | ^ sbl_greek | The Greek New Testament text. Annotation for the New Testament corpora only. Aligned by verse with the Coptic. Source is the [[http:// | + | |sbl_greek | The Greek New Testament text. Annotation for the New Testament corpora only. Aligned by verse with the Coptic. Source is the [[http:// |
- | ^ sbl_apparatus | The apparatus for the Greek New Testament text. Annotation for the New Testament corpora only. Aligned by verse with the Coptic and Greek New Testament text. Source is the [[http:// | + | |sbl_apparatus | The apparatus for the Greek New Testament text. Annotation for the New Testament corpora only. Aligned by verse with the Coptic and Greek New Testament text. Source is the [[http:// |
- | ^ div@type | For EpiDoc TEI compatibility, | + | |div@type | For EpiDoc TEI compatibility, |
- | ^ vid | formerly verse@id (Sahidica) | | + | |vid | formerly verse@id (Sahidica) | |
- | ^ chapter@cname | chapter of text written as text and number (not necessary -- in other data) | | + | |chapter@cname | chapter of text written as text and number (not necessary -- in other data) | |
- | ^ chapter@cid | chapter id (Sahidica-- not necessary) | | + | |chapter@cid | chapter id (Sahidica-- not necessary) | |
- | ^ verse@vname | verse of text written as text and number (e.g. 1 Corinthians 1:10) (not necessary -- in other data) | | + | |verse@vname | verse of text written as text and number (e.g. 1 Corinthians 1:10) (not necessary -- in other data) | |
- | ^ add_place | | + | |add_place | |
- | ^ vid_n | CTS URN for the verse (e.g., urn: | + | |vid_n | CTS URN for the verse (e.g., urn: |
+ | |ed_page_n | page number of a text as it appears in an edition | | ||
+ | |ed_line_n | line number of a text as it appears in an edition | | ||
+ | |entity | one of the ten entity types (e.g. person, place) see [[https:// | ||
+ | |identity | this annotation stores linked entry identifiers for named entities; it is populated automatically during export by GitDox if named entities have been added using the entity annotation interface. Annotators do not need to manually add this column| | ||
+ | |arabic | Arabic translation. Spans should follow translation and verse layers | | ||
- | === Preferred order of layers | + | === NOT columns in the spreadsheet |
- | ^ tok |orig |orig_group |norm_group |norm |pos |morph |lang |translation |lb_n |cb_n |pb_xml_id |p |hi_rend |supplied_reason |ignore:note | | + | |
+ | The following information should **NOT** be annotated manually in the spreadsheet, | ||
+ | |||
+ | * identity - this is the ANNIS annotation corresponding to named entity linking (Wikification). This information comes from the entity identification annotations in entities mode (after clicking "List named entities" | ||
+ | * func / head - this is syntactic information from automatic or gold parsing. It is never done in spreadsheet mode, but added during publication by an automatic parser, or annotated manually in the Arborator interface (but NOT in GitDox) | ||
+ | * multiword - multiword expression annotation is also added automatically during publication based on the currect state of multiword entries in the Coptic Dictionary Online. It is not edited manually and should not be included in the spreadsheet. | ||
=== Metadata === | === Metadata === | ||
+ | |||
(Note: see also the [[corpus_metadata|corpus-level metadata]] documentation for adding metadata for the entire corpus.) | (Note: see also the [[corpus_metadata|corpus-level metadata]] documentation for adding metadata for the entire corpus.) | ||
- | ^ corpus | + | [[https:// |
- | ^ Coptic_edition |if the text has been published before, include publication information here | | + | |
- | ^ Greek_source | optional, information about the Greek version | + | |annotation |
- | ^ title | + | |arabic_translation | names of people who translated the text into Arabic in comma-delimited sequence |
- | ^ msItem_title | + | |attributed_author | optional. attributed author |
- | ^ author | author of the conceptual work | | + | |author | author of the conceptual work | |
- | ^ language | language | + | |collection |collection or department |
- | ^ annotation | names of annotators (transcribers, editors, annotators) in comma delimited sequence | + | |Coptic_edition |if the text has been published before, include publication information here | |
- | ^ project | name of project supporting | + | |copyist | optional. copyist or scribe |
- | ^ translation | use " | + | |corpus |
- | ^ msName | use CMCL code (e.g., MONB.YA); optional | + | |country |
- | ^ pages_from | beginning of page sequence of document | + | |document_cts_urn | urn that applies to the document |
- | ^ pages_to |optional but must use msName, pages_from, pages_to all three or none at all| | + | |endnote |
- | ^ msContents_title@type |used for things like Shenoute' | + | |entities| describes whether entity annotation has been reviewed. Available values are automatic, checked, or gold; required| |
- | ^ msContents_title@n |volume number | + | |Greek_source | optional, information about the Greek version |
- | ^ repository |current museum/ | + | |identities| describes whether named entity linking has been reviewed. Available values are automatic, checked, |
- | ^ collection |collection | + | |idno |catalogue # of the manuscript in the current repository| |
- | ^ idno |catalogue # of the manuscript in the current repository| | + | |language |
- | ^ version@n |version of this Coptic SCRIPTORIUM data| | + | |license |use for copyright in Sahidica, CC-BY for everything else. If using CC-BY, enter <a href=' |
- | ^ version@date |version date of this Coptic SCRIPTORIUM data in YYYY-MM-DD format; format cell as text not as date format in excel| | + | |msContents_title@n |volume number of the thing in msContents_title@type; |
- | ^ source_info | | + | |msContents_title@type |used for things like Shenoute' |
- | ^ license |use for copyright in Sahidica, CC-BY for everything else. If using CC-BY, enter <a href=' | + | |msItem_title |
- | ^document_cts_urn | + | |msName | use CMCL code (e.g., MONB.YA); optional but must use msName, pages_from, pages_to all three or none at all| |
- | ^Trismegistos | + | |next | contains the CTS urn for the next document in the corpus; optional| |
- | ^objectType | + | |note | optional | |
- | ^country | + | |objectType |
- | ^placeName | + | |order | contains a number that orders |
- | ^origPlace | + | |origDate |
- | ^origDate | + | |origDate_notAfter| date of the terminum |
- | ^origDate_precision|likelihood that the dating is accurate -- usually " | + | |origDate_notBefore| date of the terminum |
- | ^origDate_notBefore| date of the terminum | + | |origDate_precision|likelihood that the dating is accurate -- usually " |
- | ^origDate_notAfter| date of the terminum | + | |origPlace |
- | ^source | + | |pages_from | beginning of page sequence of document (original page number of scribe but written in arabic numerals) optional but must use msName, pages_from, pages_to all three or none at all| |
- | ^note | optional | | + | |pages_to |optional but must use msName, pages_from, pages_to all three or none at all| |
- | ^witness | + | |parsing| describes whether parsing has been reviewed. Available values are automatic, checked, or gold; required| |
- | ^redundant | + | |paths_authors |
- | ^previous | + | |paths_manuscripts | PATHs project Coptic Literary Manuscript stable id entered as a link, e.g. <a href=' |
- | ^next | contains | + | |paths_works |
- | ^endnote | + | |placeName |
- | ^order | + | |previous |
- | ^parsing| describes whether | + | |project | name of project supporting the transcription/ |
- | ^segmentation| describes whether segmentation and tokenization has been reviewed. Available values are automatic, checked, or gold; required| | + | |redundant |
- | ^tagging| describes whether tagging has been reviewed. Available values are automatic, checked, or gold; required| | + | |repository |
+ | |segmentation| describes whether | ||
+ | |source | ||
+ | |source_info | | | ||
+ | |tagging| describes whether tagging has been reviewed. Available values are automatic, checked, or gold; required| | ||
+ | |title | ||
+ | |translation | use " | ||
+ | |Trismegistos |[enter the trismegistos # if it exists/is known for the manuscript]; | ||
+ | |version@date |version date of this Coptic SCRIPTORIUM data in YYYY-MM-DD format; format cell as text not as date format in excel| | ||
+ | |version@n |version of this Coptic SCRIPTORIUM data| | ||
+ | |witness | ||
+ | |||
+ | === Automatic metadata === | ||
+ | |||
+ | GitDox will automatically generate semi-colon separated lists of named entities in the following metadata fields during export. They will not show up in the GitDox table, and you should not add or edit these manually: | ||
+ | |people | named people identifiers for people mentioned in the document (separated by "; ") | | ||
+ | |places | named place identifiers for places mentioned in the document (separated by "; ") | | ||
annotation_layer_names.1539906953.txt.gz · Last modified: 2018/10/18 17:55 by admin