Vedic Extensions in OpenType

This document outlines the shaping information needed to display characters from the Unicode Vedic Extensions block, which may be used within text runs in many Indic scripts.

Table of Contents

General information
Terminology
Glyph classification
- Vedic Extensions character table
Shaping information

General information

The Vedic Extensions block encodes letters and marks that are used in a large body of ancient literature written in the Vedic Sanskrit language.

Primarily an oral language in the time period when the key literature originated, Vedic Sanskrit has no native script. Therefore, texts may be typeset in any one of the Indic scripts, using the Vedic Extensions to supplement the main script's character set.

Terminology

Individual Vedic Extension characters may be named by a combination of the Vedic text in which the mark is used, the regional or manuscript tradition involved, or a simple visual or phonetic description of the character. Some commonly used general categories are worth noting.

Udatta is the term for a high tone on a vowel.

Anudatta is the term for a low tone on a vowel.

Svarita is the term for a falling or mixed tone on a vowel.

Anusvara is the term for a nasalization sound that precedes a consonant.

Visarga is the term for a soft breathing sound that precedes a vowel.

Note: In modern Indic languages, the terms anusvara and visarga often refer to diacritical marks that have the above effects on pronunciation. In the Vedic Sanskrit language, however, they are generally considered independent letters.

Glyph classification

For most codepoints, the General Category property defined in the Unicode standard is correct, but it is not sufficient to fully capture the expected shaping behavior (such as how the character is treated during glyph reordering). Therefore, they must additionally be classified by how they are treated when shaping a run of text.

Vedic Extensions character table

Vedic Extension glyphs should be classified as in the following table. Codepoints with no assigned meaning are marked as unassigned in the Unicode category column.

Assigned codepoints marked with a null in the Shaping class column evoke no special behavior from the shaping engine.

The Mark-placement subclass column indicates mark-placement positioning. Assigned codepoints marked with a null in this column evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific behavior.

Codepoint	Unicode category	Shaping class	Mark-placement subclass	Glyph
`U+1CD0`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳐ Tone Karshana
`U+1CD1`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳑ Tone Shara
`U+1CD2`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳒ Tone Prenkha
`U+1CD3`	Punctuation	null	null	᳓ Sign Nihshvasa
`U+1CD4`	Mark [Mn]	CANTILLATION	OVERSTRUCK	᳔ Tone Midline Svarita
`U+1CD5`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳕ Tone Aggravated Independent Svarita
`U+1CD6`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳖ Tone Independent Svarita
`U+1CD7`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳗ Tone Kathaka Independent Svarita
`U+1CD8`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳘ Tone Candra Below
`U+1CD9`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳙ Tone Kathaka Independent Svarita Schroeder
`U+1CDA`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳚ Tone Double Svarita
`U+1CDB`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳛ Tone Triple Svarita
`U+1CDC`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳜ Tone Kathaka Anudatta
`U+1CDD`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳝ Tone Dot Below
`U+1CDE`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳞ Tone Two Dots Below
`U+1CDF`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳟ Tone Three Dots Below

`U+1CE0`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳠ Tone Rigvedic Kashmiri Independent Svarita
`U+1CE1`	Mark [Mc]	CANTILLATION	RIGHT_POSITION	᳡ Tone Atharavedic Independent Svarita
`U+1CE2`	Mark [Mn]	AVAGRAHA	OVERSTRUCK	᳢ Sign Visarga Svarita
`U+1CE3`	Mark [Mn]	null	OVERSTRUCK	᳣ Sign Visarga Udatta
`U+1CE4`	Mark [Mn]	null	OVERSTRUCK	᳤ Sign Reversed Visarga Udatta
`U+1CE5`	Mark [Mn]	null	OVERSTRUCK	᳥ Sign Visarga Anudatta
`U+1CE6`	Mark [Mn]	null	OVERSTRUCK	᳦ Sign Reversed Visarga Anudatta
`U+1CE7`	Mark [Mn]	null	OVERSTRUCK	᳧ Sign Visarga Udatta With Tail
`U+1CE8`	Mark [Mn]	AVAGRAHA	OVERSTRUCK	᳨ Sign Visarga Anudatta With Tail
`U+1CE9`	Letter	AVAGRAHA	null	ᳩ Sign Anusvara Antargomukha
`U+1CEA`	Letter	null	null	ᳪ Sign Anusvara Bahirgomukha
`U+1CEB`	Letter	null	null	ᳫ Sign Anusvara Vamagomukha
`U+1CEC`	Letter	AVAGRAHA	null	ᳬ Sign Anusvara Vamagomukha With Tail
`U+1CED`	Mark [Mn]	AVAGRAHA	BOTTOM_POSITION	᳭ Sign Tiryak
`U+1CEE`	Letter	AVAGRAHA	null	ᳮ Sign Hexiform Long Anusvara
`U+1CEF`	Letter	null	null	ᳯ Sign Long Anusvara

`U+1CF0`	Letter	null	null	ᳰ Sign Rthang Long Anusvara
`U+1CF1`	Letter	AVAGRAHA	null	ᳱ Sign Anusvara Ubhayato Mukha
`U+1CF2`	Letter	CONSONANT_DEAD	null	ᳲ Sign Ardhavisarga
`U+1CF3`	Letter	CONSONANT_DEAD	null	ᳳ Sign Rotated Ardhavisarga
`U+1CF4`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳴ Tone Candra Above
`U+1CF5`	Letter	CONSONANT_WITH_STACKER	null	ᳵ Sign Jihvamuliya
`U+1CF6`	Letter	CONSONANT_WITH_STACKER	null	ᳶ Sign Upadhmaniya
`U+1CF7`	Mark [Mc]	null	null	᳷ Sign Atikrama
`U+1CF8`	Mark [Mn]	CANTILLATION	null	᳸ Tone Ring Above
`U+1CF9`	Mark [Mn]	CANTILLATION	null	᳹ Tone Double Ring Above
`U+1CFA`	Letter	PLACEHOLDER	null	ᳺ Sign Double Anusvara Antargomukha
`U+1CFB`	unassigned
`U+1CFC`	unassigned
`U+1CFD`	unassigned
`U+1CFE`	unassigned
`U+1CFF`	unassigned

Shaping information

31 of the characters in the block are categorized as marks. 27 of these marks are subcategorized as non-spacing; the remaining four are spacing-combining.

Of the non-spacing marks, 20 are classified as CANTILLATION (or tone-marker) indicators, which modify the pitch of vowels. Most of these marks are generally positioned above or below the main character, using GPOS mark attachment, in a position that does not interact or interfere with the main character. In Unicode, the CANTILLATION classification is separate from the TONE_MARKER classification used in some scripts for semantic reasons; the two classifications are identical for shaping purposes.

Some of the marks (cantillation and non-cantillation) are classified as OVERSTRUCK in the Mark-placement subclass column. This indicates that the mark is intended to be rendered on top of the preceding character. During reordering, OVERSTRUCK marks are tagged for the ordering position POS_AFTER_MAIN.

Some marks are classified, for shaping purposes, as AVAGRAHA or VISARGA. This indicates that the mark behaves more like the Avagraha or Visarga character than like a diacritic.

Characters that are categorized in Unicode as letters vary with respect to whether or not they trigger special behavior in the shaping process. These include letters that are classified as CONSONANT and letters that are classified as AVAGRAHA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opentype-shaping-vedic-extensions.md

opentype-shaping-vedic-extensions.md

Vedic Extensions in OpenType

General information

Terminology

Glyph classification

Vedic Extensions character table

Shaping information

Files

opentype-shaping-vedic-extensions.md

Latest commit

History

opentype-shaping-vedic-extensions.md

File metadata and controls

Vedic Extensions in OpenType

General information

Terminology

Glyph classification

Vedic Extensions character table

Shaping information