This is an old revision of the document!

TEI tagging

Each diplomatic transcription is encoded in TEI XML format. Each TEI file consists of a <teiHeader> tag and a <text> tag. In the <teiHeader>, the most important tag to fill out is the <idno type=“siglum”> tag, which specifies the siglum of the witness which will be used in any generated apparatus. See the template for more details.

The <text> tag is where the transcription will go. Each <text> should consist of a <body> with a number of paragraphs (<p>) and/or verses (<lg> and <l>). Each paragraph or verse should have a unique xml:id. These xml:id identifiers will be used to collate like sections of text together.

For transcribing, a subset of TEI is used to mark features such as additions, deletions, and marginal annotations.

<subst>, <add>, and <del>

Indicates text that has been substituted.

the rend attribute indicates how the addition or deletion is marked.
the place attribute indicates where the addition is noted.

sarva<subst><del rend="crossed out">vidyā</del><add place="right-margin">śabdā</add></subst>nām

sarva~~vidyā~~śabdānām

<choice>, <corr>, and <orig>

Indicates text that the transcriber has corrected.

va<choice><orig>ṃta</orig><corr>rta</corr></choice>tamānaḥ

va~~ṃta~~rtamānaḥ

<caesura>

Indicates a caesura in a line of verse.

suvarṇarucakādi ya<caesura/>thā yuktaṃ svakair ākāraiḥ

suvarṇarucakādi ya-
thā yuktaṃ svakair ākāraiḥ

<gap>

Indicates a section of the text that is unreadable by the transcriber.

the reason attribute gives a reason for the text being illegible.
the unit attribute is the unit of measure.
the quantity attirbute is the length of the gap in units.

sā<gap reason="damaged" unit="akṣara" quantity="5"/>rvārthyam

sā.......rvārthyam

<hi>

Indicates text that has been marked in some way for emphasis.

the rend attribute indicates how the text has been marked.

<hi rend="rubricated">ātma vastu sva</hi>bhāvaś ca

ātma vastu svabhāvaś ca

<lb>

Indicates a line break.

the n attribute indicates the line number.
if a line break corresponds to a word break, then the space appears before the <lb> tag.

vya-<lb n="5"/>vahāre

,

ātmā <lb n="6"/>tattvaṃ

vya-⸤vahāre, ātmā tattvaṃ

<metamark>

Indicates a character that is not part of the content.

the function attribute indicates the function of the character.

tat siddhiḥ<metamark function="place marker">✗</metamark>

tat siddhiḥ✗

<milestone>

Indicates a location in a document.

The n attribute indicates a page or folio number.
The unit attribute is the unit of measure.

<milestone n="34r4" unit="folio"/>jātir vā dravyaṃ vā padārthāv iti

(From folio 34r4) jātir vā dravyaṃ vā padārthāv iti

<note>

Indicates a note.

The place attribute indicates where the note is placed, such as in the top-margin, footer, appendix, or inline.
The xml:lang attribute indicates the language of the note. This main use of this is to write notes in English; if xml:lang=“en” is set, then the note text will not be transcribed into other scripts.

‘ktaktavatū niṣṭhā’<note place="inline"/>(pa॰ 1|1|16)</note>

‘ktaktavatū niṣṭhā’(pa॰ 1|1|16)

<pb>

Indicates a page break.

the n attribute indicates the folio or page number.

anekaviṣayanihi<pb n="3r"/>tapadānām

anekaviṣayanihiLtapadānām

<retrace>

Indicates text that has been retraced.

gṛhītaṃ <retrace>gṛha</retrace>śabdena śuddham evābhidhīyate

gṛhītaṃ gṛhaśabdena śuddham evābhidhīyate

<sic>

Indicates text that has been transcribed as found in the document, without correction.

viśeṣo<sic>pa</sic>dhiḥ

viśeṣopadhiḥ

<space>

Indicates a blank space in the text.

the unit attribute is the unit of measure.
the quantity attirbute is the length of the space in units.

ity arthaḥ<space unit="akṣara" quantity="2"/> atha ca

ity arthaḥ__ atha ca

<supplied>

Indicates text that has been supplied by the transcriber.

ta<supplied>d</supplied> dravyam

tad dravyam

<surplus>

Indicates text that the transcriber believes is superfluous.

the reason attribute indicates the transcriber's reason for marking the text as superfluous.

iti teṣān darśanaṃ<lb n="5"/><surplus reason="repeated after line break">naṃ</surplus>

iti teṣān darśanaṃ⸤naṃ

<unclear>

Indicates a passage that is unclear to the transcriber.

the reason attribute indicates why the text is unclear.

svarūpānyathā<unclear>t tānāpapaptiḥ</unclear>

svarūpānyathāt tānāpapaptiḥ

Table of Contents

TEI tagging

<subst>, <add>, and <del>

<choice>, <corr>, and <orig>

<caesura>

<gap>

<hi>

<lb>

<metamark>

<milestone>

<note>

<pb>

<retrace>

<sic>

<space>

<supplied>

<surplus>

<unclear>

saktumIva

User Tools

Table of Contents

TEI tagging

<subst>, <add>, and <del>

<choice>, <corr>, and <orig>

<caesura>

<gap>

<hi>

<lb>

<metamark>

<milestone>

<note>

<pb>

<retrace>

<sic>

<space>

<supplied>

<surplus>

<unclear>

Page Tools

saktumIva