This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
wiki:orthography [2019/08/21 18:37] – [Orthography] chuck | wiki:orthography [2023/03/15 04:22] (current) – chuck | ||
---|---|---|---|
Line 9: | Line 9: | ||
==== Orthography ==== | ==== Orthography ==== | ||
- | In order to filter out orthographic variants, regular expressions are used to normalize the text. | + | In order to filter out orthographic variants, regular expressions are used to normalize the text. Some of these filters may overlap; if a portion of text matches more than one filter, only one filter is applied. Filters are applied in the order that they appear. |
+ | |||
+ | Some regular expressions may coincide with sandhi rules described in the // | ||
Examples are given to show how the text is normalized; counterexamples are exceptions which are not normalized. | Examples are given to show how the text is normalized; counterexamples are exceptions which are not normalized. | ||
+ | |||
+ | Normalized spellings may not represent what is generally considered to be " | ||
=== geminated t === | === geminated t === | ||
<code pcre>/ | <code pcre>/ | ||
- | * replaces tt(h) after consonantal/ | + | * replaces |
- | * examples: | + | |
- | * counterexamples: | + | * __examples__ |
+ | * arttha => artha | ||
+ | * saṃskṛtta => saṃskṛta | ||
+ | * prākritta => prākrita | ||
+ | * tattvam => tatvam | ||
+ | * pattram => patram | ||
+ | * __counterexamples__ | ||
+ | * atty annam (source: [[http:// | ||
=== geminated consonants after r === | === geminated consonants after r === | ||
- | <code pcre>/ | + | <code pcre>/ |
- | * replaces doubled consonants (excluding t) after consonantal/ | + | * replaces doubled consonants (excluding t) after consonantal/ |
- | * examples: | + | |
+ | * __examples__ | ||
+ | * arddha => ardha | ||
+ | * dharmma => dharma | ||
+ | * pṛcchati => pṛchati | ||
=== geminated aspirated consonants === | === geminated aspirated consonants === | ||
<code pcre>/ | <code pcre>/ | ||
- | * replaces jjh, tth, ṭṭh, and ddh with jh, th, ṭh, and dh respectively | + | * replaces |
- | * examples: | + | |
+ | * __examples__ | ||
+ | * attha => atha | ||
+ | *daddhi => dadhi | ||
+ | |||
+ | === final nasal variants === | ||
+ | <code pcre>/ | ||
+ | |||
+ | * replaces final -ṃl, -ṃś, -ṃs, and -nn with -n (Cf. [[https:// | ||
+ | |||
+ | * __examples__ | ||
+ | * gacchaṃs tu => gacchan tu | ||
+ | * puruṣānn atti => puruṣān atti | ||
+ | |||
+ | === internal nasal variants === | ||
+ | <code pcre>/ | ||
+ | |||
+ | * replaces nasals preceding certain consonants with an anusvāra (this regular expression is the opposite of rule [[https:// | ||
+ | |||
+ | * __examples__ | ||
+ | * nandita => naṃdita | ||
+ | * yuñjati => yuṃjati | ||
+ | |||
+ | ==== Script/ | ||
+ | |||
+ | Some normalization filters require a tag in your TEI header, because they only apply to certain scripts or specific scribal practices. When these filters are activated, they will only apply to those transcriptions which have the corresponding tag. | ||
+ | |||
+ | === pṛṣṭhamātrā vowels === | ||
+ | <code pcre>/ | ||
+ | |||
+ | In transcriptions of Devanāgarī sources, pṛṣṭhamātrā vowels are transcribed as ê, aî, ô, and aû (see [[wiki: | ||
+ | |||
+ | * These filters require '' | ||
+ | |||
+ | === valapalagilaka === | ||
+ | <code pcre>/ | ||
+ | |||
+ | In transcriptions of Telugu sources, the valapalagilaka reph is transcribed as ṙ (see [[wiki: | ||
+ | |||
+ | * This filter requires '' | ||
+ | |||
+ | === ṭh written as ṭ === | ||
+ | <code pcre>/ | ||
+ | |||
+ | In some Devanāgarī manuscripts, | ||
+ | |||
+ | * This filter requires a ''< | ||
+ | |||
+ | === b written as v === | ||
+ | <code pcre>/ | ||
+ | |||
+ | In some scripts, b is not distinguished from v. | ||
+ | |||
+ | * This filter requires a ''< | ||
+ | |||
+ | === dbh written as bhd === | ||
+ | <code pcre>/ | ||
+ | |||
+ | In some Devanāgarī manuscripts, | ||
+ | |||
+ | * This filter requires a ''< |