This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
wiki:orthography [2019/08/21 19:14] – [Orthography] chuck | wiki:orthography [2023/03/15 04:22] (current) – chuck | ||
---|---|---|---|
Line 10: | Line 10: | ||
In order to filter out orthographic variants, regular expressions are used to normalize the text. Some of these filters may overlap; if a portion of text matches more than one filter, only one filter is applied. Filters are applied in the order that they appear. | In order to filter out orthographic variants, regular expressions are used to normalize the text. Some of these filters may overlap; if a portion of text matches more than one filter, only one filter is applied. Filters are applied in the order that they appear. | ||
+ | |||
+ | Some regular expressions may coincide with sandhi rules described in the // | ||
Examples are given to show how the text is normalized; counterexamples are exceptions which are not normalized. | Examples are given to show how the text is normalized; counterexamples are exceptions which are not normalized. | ||
+ | |||
+ | Normalized spellings may not represent what is generally considered to be " | ||
=== geminated t === | === geminated t === | ||
<code pcre>/ | <code pcre>/ | ||
- | * replaces tt(h) after consonantal/ | + | * replaces |
- | * examples: | + | |
- | * counterexamples: | + | * __examples__ |
+ | * arttha => artha | ||
+ | * saṃskṛtta => saṃskṛta | ||
+ | * prākritta => prākrita | ||
+ | * tattvam => tatvam | ||
+ | * pattram => patram | ||
+ | * __counterexamples__ | ||
+ | * atty annam (source: [[http:// | ||
=== geminated consonants after r === | === geminated consonants after r === | ||
- | <code pcre>/ | + | <code pcre>/ |
* replaces doubled consonants (excluding t) after consonantal/ | * replaces doubled consonants (excluding t) after consonantal/ | ||
- | * examples: | + | |
+ | * __examples__ | ||
+ | * arddha => ardha | ||
+ | * dharmma => dharma | ||
+ | * pṛcchati => pṛchati | ||
=== geminated aspirated consonants === | === geminated aspirated consonants === | ||
<code pcre>/ | <code pcre>/ | ||
- | * replaces jjh, tth, ṭṭh, and ddh with jh, th, ṭh, and dh respectively (Cf. [[https:// | + | * replaces |
- | * examples: | + | |
+ | * __examples__ | ||
+ | * attha => atha | ||
+ | *daddhi => dadhi | ||
=== final nasal variants === | === final nasal variants === | ||
<code pcre>/ | <code pcre>/ | ||
- | * replaces -ṃl, -ṃś, -ṃs, and -nn with -n (Cf. [[https:// | + | * replaces |
- | * examples: | + | |
- | * counterexamples: aṃśa, annam | + | * __examples__ |
+ | * gacchaṃs tu => gacchan tu | ||
+ | * puruṣānn | ||
+ | |||
+ | === internal nasal variants === | ||
+ | <code pcre>/ | ||
+ | |||
+ | * replaces nasals preceding certain consonants with an anusvāra (this regular expression is the opposite of rule [[https:// | ||
+ | |||
+ | * __examples__ | ||
+ | * nandita => naṃdita | ||
+ | * yuñjati => yuṃjati | ||
+ | |||
+ | ==== Script/ | ||
+ | |||
+ | Some normalization filters require | ||
+ | |||
+ | === pṛṣṭhamātrā vowels === | ||
+ | <code pcre>/ | ||
+ | |||
+ | In transcriptions of Devanāgarī sources, pṛṣṭhamātrā vowels are transcribed as ê, aî, ô, and aû (see [[wiki: | ||
+ | |||
+ | * These filters require '' | ||
+ | |||
+ | === valapalagilaka === | ||
+ | <code pcre>/ | ||
+ | |||
+ | In transcriptions of Telugu sources, the valapalagilaka reph is transcribed as ṙ (see [[wiki: | ||
+ | |||
+ | * This filter requires '' | ||
+ | |||
+ | === ṭh written as ṭ === | ||
+ | <code pcre>/ | ||
+ | |||
+ | In some Devanāgarī manuscripts, | ||
+ | |||
+ | * This filter requires a ''< | ||
+ | |||
+ | === b written as v === | ||
+ | <code pcre>/ | ||
+ | |||
+ | In some scripts, b is not distinguished from v. | ||
+ | |||
+ | * This filter requires a ''< | ||
+ | |||
+ | === dbh written as bhd === | ||
+ | <code pcre>/ | ||
+ | |||
+ | In some Devanāgarī manuscripts, | ||
+ | |||
+ | * This filter requires a ''< |