TEI tagging

Each diplomatic transcription is encoded in TEI XML format. Each TEI file consists of a <teiHeader> tag and a <text> tag. In the <teiHeader>, the most important tag to fill out is the <idno type=“siglum”> tag, which specifies the siglum of the witness which will be used in any generated apparatus. See the template for more details.

The <text> tag is where the transcription will go. Each <text> should consist of a <body> with a number of paragraphs (<p>) and/or verses (<lg> and <l>). Each paragraph or verse should have a unique xml:id. These xml:id identifiers will be used to collate like sections of text together.

For transcribing, a subset of TEI is used to mark features such as additions, deletions, and marginal annotations.

<subst>, <add>, and <del>

Indicates text that has been substituted.

sarva<subst><del rend="crossed out">vidyā</del><add place="right-margin">śabdā</add></subst>nām

<choice>, <corr>, and <orig>

Indicates text that the transcriber has corrected.



Indicates a caesura in a line of verse.

suvarṇarucakādi ya<caesura/>thā yuktaṃ svakair ākāraiḥ
suvarṇarucakādi ya-
thā yuktaṃ svakair ākāraiḥ


Indicates a section of the text that is unreadable by the transcriber.

<gap reason="damaged" unit="akṣara" quantity="5"/>rvārthyam


Indicates text that has been marked in some way for emphasis.

<hi rend="rubricated">ātma vastu sva</hi>bhāvaś ca
ātma vastu svabhāvaś ca


Indicates a new line beginning.

vya-<lb n="5"/>vahāre, ātmā <lb n="6"/>tattvaṃ
vya-vahāre, ātmā tattvaṃ


Indicates a character that is not part of the content.

tat siddhiḥ<metamark function="place marker"></metamark>
tat siddhiḥ


Indicates a location in a document.

<milestone n="34r4" unit="folio"/>jātir vā dravyaṃ vā padārthāv iti
(From folio 34r4) jātir vā dravyaṃ vā padārthāv iti


Indicates a note.

‘ktaktavatū niṣṭhā’<note place="inline"/>(pa॰ 1|1|16)</note>
‘ktaktavatū niṣṭhā’(pa॰ 1|1|16)


Indicates a page beginning.

anekaviṣayanihi<pb n="3r"/>tapadānām

<pc> </pc>

Used to split compounds words.

aneka<pc> </pc>viṣaya<pc> </pc>nihita<pc> </pc>padānām

The spaces are not displayed, but variant readings will be split using those spaces as guides. This manual procedure is only needed occasionally, to clarify the critical apparatus in the case of very long or complex compounds.


Indicates text that has been retraced.

gṛhītaṃ <retrace>gṛha</retrace>śabdena śuddham evābhidhīyate
gṛhītaṃ gṛhaśabdena śuddham evābhidhīyate


Indicates text that has been transcribed as found in the document, without correction.



Indicates a blank space in the text.

ity arthaḥ<space unit="akṣara" quantity="2"/> atha ca
ity arthaḥ__ atha ca


Indicates text that has been supplied by the transcriber.

ta<supplied>d</supplied> dravyam
tad dravyam


Indicates text that the transcriber believes is superfluous.

iti teṣān darśanaṃ<lb n="5"/><surplus reason="repeated after line break">naṃ</surplus>
iti teṣān darśanaṃnaṃ


Indicates a passage that is unclear to the transcriber.

svarūpānyathā<unclear>t tānāpapaptiḥ</unclear>
svarūpānyathāt tānāpapaptiḥ