===== Punctuation and Orthography ===== The apparatus generator has a number of options for filtering out insignificant variants, by ignoring punctuation marks as well as commonplace differences in orthography. All filters are turned on by default. ==== Punctuation ==== The punctuation filters are relatively straightforward; when, for example, the daṇḍa filter is turned on, all daṇḍas will be ignored during the comparison of the texts. ==== Orthography ==== In order to filter out orthographic variants, regular expressions are used to normalize the text. Some of these filters may overlap; if a portion of text matches more than one filter, only one filter is applied. Filters are applied in the order that they appear. Some regular expressions may coincide with sandhi rules described in the //Aṣṭādhyāyī//, although, in practice, they may not reproduce the sandhi rules exactly. Examples are given to show how the text is normalized; counterexamples are exceptions which are not normalized. Normalized spellings may not represent what is generally considered to be "correct" Sanskrit; however, they do reflect orthographic practices as attested in manuscripts. === geminated t === /(?<=[rṛṙi]|pa)tt|tt(?=[rvy]\S)/t/ * replaces -tt(h)- after consonantal/vocalic r, i, and pa with -t-; replaces -tt(h)- preceding r, v, or y within a word with -t- * __examples__ * arttha => artha * saṃskṛtta => saṃskṛta * prākritta => prākrita * tattvam => tatvam * pattram => patram * __counterexamples__ * atty annam (source: [[http://www.sanskrit-linguistics.org/dcs/index.php?contents=texte&PhraseID=461691|DCS]]) === geminated consonants after r === /(?<=[rṛṙ]|[rṛ]\s)([kgcjṭḍṇdnpbmyvl])\1/\1/ * replaces doubled consonants (excluding t) after consonantal/vocalic r with a single consonant (Cf. [[https://www.sanskritdictionary.com/panini/8-4-46|Aṣṭādhyāyī 8.4.46]]) * __examples__ * arddha => ardha * dharmma => dharma * pṛcchati => pṛchati === geminated aspirated consonants === /([jtṭd])\1(?=h)/\1/ * replaces -jjh-, -tth-, -ṭṭh-, and -ddh- with -jh-, -th-, -ṭh-, and -dh- respectively (Cf. [[https://www.sanskritdictionary.com/panini/8-4-47|Aṣṭādhyāyī 8.4.47]]) * __examples__ * attha => atha *daddhi => dadhi === final nasal variants === /(?:ṃ[lśs]|nn)(?!\S)/n/ * replaces final -ṃl, -ṃś, -ṃs, and -nn with -n (Cf. [[https://www.sanskritdictionary.com/panini/8-3-7|Aṣṭādhyāyī 8.3.7]], etc.) * __examples__ * gacchaṃs tu => gacchan tu * puruṣānn atti => puruṣān atti === internal nasal variants === /[mnñṇṅ](?=[pbmdtnṭḍcjkg])/ṃ/ * replaces nasals preceding certain consonants with an anusvāra (this regular expression is the opposite of rule [[https://www.sanskritdictionary.com/panini/8-4-58|Aṣṭādhyāyī 8.4.58]], as to be more efficient) * __examples__ * nandita => naṃdita * yuñjati => yuṃjati ==== Script/scribe specific filters ==== Some normalization filters require a tag in your TEI header, because they only apply to certain scripts or specific scribal practices. When these filters are activated, they will only apply to those transcriptions which have the corresponding tag. === pṛṣṭhamātrā vowels === /ê/e/ /î/i/ /ô/o/ /û/u/ In transcriptions of Devanāgarī sources, pṛṣṭhamātrā vowels are transcribed as ê, aî, ô, and aû (see [[wiki:transcription|Transcription conventions]]). * These filters require ''@mainLang="sa-Deva"'' in the '''' tag. === valapalagilaka === /ṙ/r/ In transcriptions of Telugu sources, the valapalagilaka reph is transcribed as ṙ (see [[wiki:transcription|Transcription conventions]]). * This filter requires ''@mainLang="sa-Telu"'' in the '''' tag. === ṭh written as ṭ === /ṭh/ṭ/ In some Devanāgarī manuscripts, it is common for ṭh to be written as ṭ. * This filter requires a '''' tag with the ''@xml:id="script-ṭha-ṭa"''. === b written as v === /b(?!h)/v/ In some scripts, b is not distinguished from v. * This filter requires a '''' tag with the ''@xml:id="script-ba-va"''. === dbh written as bhd === /bh(\s?)d(?!h)/d\1bh/ In some Devanāgarī manuscripts, the conjunct dbh is written as bhd. * This filter requires a '''' tag with the ''@xml:id="script-dbha-bhda"''.