User Tools


This is an old revision of the document!


TEI tagging

Each diplomatic transcription is encoded in TEI XML format. Each TEI file consists of a <teiHeader> tag and a <text> tag. In the <teiHeader>, the most important tag to fill out is the <idno type=“siglum”> tag, which specifies the siglum of the witness which will be used in any generated apparatus. See the template for more details.

The <text> tag is where the transcription will go. Each <text> should consist of a <body> with a number of paragraphs (<p>) and/or verses (<lg> and <l>). Each paragraph or verse should have a unique xml:id. These xml:id identifiers will be used to collate like sections of text together.

For transcribing, a subset of TEI is used to mark features such as additions, deletions, and marginal annotations.

<subst>, <add>, and <del>

Indicates text that has been substituted.

  • the rend attribute indicates how the addition or deletion is marked.
  • the place attribute indicates where the addition is noted.
sarva<subst><del rend="crossed out">vidyā</del><add place="right-margin">śabdā</add></subst>nām
sarvavidyāśabdānām

<choice>, <corr>, and <orig>

Indicates text that the transcriber has corrected.

va<choice><orig>ṃta</orig><corr>rta</corr></choice>tamānaḥ
vaṃtartamānaḥ

<caesura>

Indicates a caesura in a line of verse.

suvarṇarucakādi ya<caesura/>thā yuktaṃ svakair ākāraiḥ
suvarṇarucakādi ya-
thā yuktaṃ svakair ākāraiḥ

<gap>

Indicates a section of the text that is unreadable by the transcriber.

  • the reason attribute gives a reason for the text being illegible.
  • the unit attribute is the unit of measure.
  • the quantity attirbute is the length of the gap in units.
<gap reason="damaged" unit="akṣara" quantity="5"/>rvārthyam
.......rvārthyam

<hi>

Indicates text that has been marked in some way for emphasis.

  • the rend attribute indicates how the text has been marked.
<hi rend="rubricated">ātma vastu sva</hi>bhāvaś ca
ātma vastu svabhāvaś ca

<lb>

Indicates a line break.

  • the n attribute indicates the line number.
  • if a line break corresponds to a word break, then the space appears before the <lb> tag.
vya-<lb n="5"/>vahāre

,

ātmā <lb n="6"/>tattvaṃ
vya-vahāre, ātmā tattvaṃ

<metamark>

Indicates a character that is not part of the content.

  • the function attribute indicates the function of the character.
tat siddhiḥ<metamark function="place marker"></metamark>
tat siddhiḥ

<milestone>

Indicates a location in a document.

  • The n attribute indicates a page or folio number.
  • The unit attribute is the unit of measure.
<milestone n="34r4" unit="folio"/>jātir vā dravyaṃ vā padārthāv iti
(From folio 34r4) jātir vā dravyaṃ vā padārthāv iti

<note>

Indicates a note.

  • The place attribute indicates where the note is placed, such as in the top-margin, footer, appendix, or inline.
  • The xml:lang attribute indicates the language of the note. This main use of this is to write notes in English; if xml:lang=“en” is set, then the note text will not be transcribed into other scripts.
‘ktaktavatū niṣṭhā’<note place="inline"/>(pa॰ 1|1|16)</note>
‘ktaktavatū niṣṭhā’(pa॰ 1|1|16)

<pb>

Indicates a page break.

  • the n attribute indicates the folio or page number.
anekaviṣayanihi<pb n="3r"/>tapadānām
anekaviṣayanihiLtapadānām

<retrace>

Indicates text that has been retraced.

gṛhītaṃ <retrace>gṛha</retrace>śabdena śuddham evābhidhīyate
gṛhītaṃ gṛhaśabdena śuddham evābhidhīyate

<sic>

Indicates text that has been transcribed as found in the document, without correction.

viśeṣo<sic>pa</sic>dhiḥ
viśeṣopadhiḥ

<space>

Indicates a blank space in the text.

  • the unit attribute is the unit of measure.
  • the quantity attirbute is the length of the space in units.
ity arthaḥ<space unit="akṣara" quantity="2"/> atha ca
ity arthaḥ__ atha ca

<supplied>

Indicates text that has been supplied by the transcriber.

ta<supplied>d</supplied> dravyam
tad dravyam

<surplus>

Indicates text that the transcriber believes is superfluous.

  • the reason attribute indicates the transcriber's reason for marking the text as superfluous.
iti teṣān darśanaṃ<lb n="5"/><surplus reason="repeated after line break">naṃ</surplus>
iti teṣān darśanaṃnaṃ

<unclear>

Indicates a passage that is unclear to the transcriber.

  • the reason attribute indicates why the text is unclear.
svarūpānyathā<unclear>t tānāpapaptiḥ</unclear>
svarūpānyathāt tānāpapaptiḥ