User Tools

This is an old revision of the document!

TEI tagging

Each diplomatic transcription is encoded in TEI XML format. Each TEI file consists of a <teiHeader> tag and a <text> tag. In the <teiHeader>, the most important tag to fill out is the <idno type=“siglum”> tag, which specifies the siglum of the witness which will be used in any generated apparatus. See the template for more details.

The <text> tag is where the transcription will go. Each <text> should consist of a <body> with a number of paragraphs (<p>) and/or verses (<lg> and <l>). Each paragraph or verse should have a unique xml:id. These xml:id identifiers will be used to collate like sections of text together.

For transcribing, a subset of TEI is used to mark features such as additions, deletions, and marginal annotations.

<subst>, <add>, and <del>

Indicates text that has been substituted.

  • the rend attribute indicates how the addition or deletion is marked.
  • the place attribute indicates where the addition is noted.
sarva<subst><del rend="crossed out">vidyā</del><add place="right-margin">śabdā</add></subst>nām

<choice>, <corr>, and <orig>

Indicates text that the transcriber has corrected.



Indicates a caesura in a line of verse.

suvarṇarucakādi ya<caesura/>thā yuktaṃ svakair ākāraiḥ
suvarṇarucakādi ya-
thā yuktaṃ svakair ākāraiḥ


Indicates a section of the text that is unreadable by the transcriber.

  • the reason attribute gives a reason for the text being illegible.
  • the unit attribute is the unit of measure.
  • the quantity attirbute is the length of the gap in units.
<gap reason="damaged" unit="akṣara" quantity="5"/>rvārthyam


Indicates text that has been marked in some way for emphasis.

  • the rend attribute indicates how the text has been marked.
<hi rend="rubricated">ātma vastu sva</hi>bhāvaś ca
ātma vastu svabhāvaś ca


Indicates a line break.

  • the n attribute indicates the line number.
  • if a line break corresponds to a word break, then the space appears before the <lb> tag.
vya-<lb n="5"/>vahāre


ātmā <lb n="6"/>tattvaṃ
vya-vahāre, ātmā tattvaṃ


Indicates a character that is not part of the content.

  • the function attribute indicates the function of the character.
tat siddhiḥ<metamark function="place marker"></metamark>
tat siddhiḥ


Indicates a location in a document.

  • The n attribute indicates a page or folio number.
  • The unit attribute is the unit of measure.
<milestone n="34r4" unit="folio"/>jātir vā dravyaṃ vā padārthāv iti
(From folio 34r4) jātir vā dravyaṃ vā padārthāv iti


Indicates a note.

  • The place attribute indicates where the note is placed, such as in the top-margin, footer, appendix, or inline.
  • The xml:lang attribute indicates the language of the note. This main use of this is to write notes in English; if xml:lang=“en” is set, then the note text will not be transcribed into other scripts.
‘ktaktavatū niṣṭhā’<note place="inline"/>(pa॰ 1|1|16)</note>
‘ktaktavatū niṣṭhā’(pa॰ 1|1|16)


Indicates a page break.

  • the n attribute indicates the folio or page number.
anekaviṣayanihi<pb n="3r"/>tapadānām


Indicates text that has been retraced.

gṛhītaṃ <retrace>gṛha</retrace>śabdena śuddham evābhidhīyate
gṛhītaṃ gṛhaśabdena śuddham evābhidhīyate


Indicates text that has been transcribed as found in the document, without correction.



Indicates a blank space in the text.

  • the unit attribute is the unit of measure.
  • the quantity attirbute is the length of the space in units.
ity arthaḥ<space unit="akṣara" quantity="2"/> atha ca
ity arthaḥ__ atha ca


Indicates text that has been supplied by the transcriber.

ta<supplied>d</supplied> dravyam
tad dravyam


Indicates text that the transcriber believes is superfluous.

  • the reason attribute indicates the transcriber's reason for marking the text as superfluous.
iti teṣān darśanaṃ<lb n="5"/><surplus reason="repeated after line break">naṃ</surplus>
iti teṣān darśanaṃnaṃ


Indicates a passage that is unclear to the transcriber.

  • the reason attribute indicates why the text is unclear.
svarūpānyathā<unclear>t tānāpapaptiḥ</unclear>
svarūpānyathāt tānāpapaptiḥ