This is an old revision of the document!


Punctuation and Orthography

The apparatus generator has a number of options for filtering out insignificant variants, by ignoring punctuation marks as well as commonplace differences in orthography. All filters are turned on by default.

Punctuation

The punctuation filters are relatively straightforward; when, for example, the daṇḍa filter is turned on, all daṇḍas will be ignored during the comparison of the texts.

Orthography

In order to filter out orthographic variants, regular expressions are used to normalize the text. Some of these filters may overlap; if a portion of text matches more than one filter, only one filter is applied. Filters are applied in the order that they appear.

Examples are given to show how the text is normalized; counterexamples are exceptions which are not normalized.

geminated t

/(?<=[rṛṙi]|pa)tt|tt(?=[rvy]\S)/t/
  • replaces tt(h) after consonantal/vocalic r, i, and pa with t; replaces tt(h) preceding r, v, or y within a word with t
  • examples: arttha ⇒ artha, saṃskṛtta ⇒ saṃskṛta, prākritta ⇒ prākrita, tattvam ⇒ tatvam, pattram ⇒ patram
  • counterexamples: atty annam (source: DCS)

geminated consonants after r

/(?<=[rṛṙ]|[rṛ]\s)(]kgcjṭḍṇdnpbmyvl)\1/\1/
  • replaces doubled consonants (excluding t) after consonantal/vocalic r with a single consonant
  • examples: arddha ⇒ ardha, dharmma ⇒ dharma, pṛcchati ⇒ pṛchati

geminated aspirated consonants

/([jtṭd])\1(?=h)/\1/
  • replaces jjh, tth, ṭṭh, and ddh with jh, th, ṭh, and dh respectively
  • examples: attha ⇒ atha, daddhi ⇒ dadhi

final nasal variants

/(?:[lśs]|nn)(?!\S)/n/
  • replaces -ṃl, -ṃś, -ṃs, and -nn with -n
  • examples: gacchaṃs tu ⇒ gacchan tu, puruṢānn eva ⇒ puruṣān eva
  • counterexamples: aṃśa, annam