Tuesday, 19 November 2013

Steps To Test Lohit Gujarati

Hi all , in regard to the blog previously written [1], many of people came to know that lohit gujarati testing activity. As we have reached to alpha release, i would like to note the steps for testing of gujarati in more details :

Installing fonts:

>> Fedora or other linux distro

1. Using graphics user interface
- open ttf using gnome-font-viewer or kfontview
- click on install fonts

2. Using terminal
- copy font to /~.local/share/fonts
- run $fc-cache
- open gedit, it should be listed now

>> Windows
1. Nicely documented on http://windows.microsoft.com/en-in/windows-vista/install-or-uninstall-fonts

One can have more details regarding the same on [2].

Steps to test 'test-gujarati' file :

  we had created test-gujarati file thereby covering all of the glyph in fontforge application & the glyphs that we tested additionally.

Here are the steps to test :

- Download test-gujarati file [3]
- Open this file with gedit or any other text editor
- Test to see the proper contextualization

You can test by copying all the glyphs from this file & to paste them on any other editor .

Your testing contribution means a lot for this project . You can have your feedback & issues as well, if any , on github [4].  The screenshots will be much more helpful for testing & fixing for issues as well .

1. http://snehakore.blogspot.in/2013/10/lohit-gujarati-is-on-testing-phase.html
2. https://raw.github.com/pravins/lohit/master/README
3. https://raw.github.com/pravins/lohit2/master/gujarati/test-gujarati.txt
4. https://github.com/pravins/lohit2/tree/master/gujarati

Tuesday, 12 November 2013

Lohit Devanagari Test File Screenshot on Windows XP !!!

TestFile Screenshots :


Lohit 2 Devanagari Beta is out already[1]. This is screenshot during testing.

Testing Environment :
Windows 8, Application notepad
Uniscribe version number : 1.420.2600.2180
Source file: https://raw.github.com/pravins/lohit/master/devanagari/test-devanagari.txt

1. https://www.redhat.com/archives/lohit-devel-list/2013-November/msg00000.html

Monday, 11 November 2013

Lohit Devanagari Test File Screenshot on Windows 8!!

TestFile Screenshots :

Lohit 2 Devanagari Beta is out already[1]. This is screenshot during testing.

Testing Environment :
Windows 8, Application notepad
Uniscribe version number : 6.2.9200.16384
Source file: https://raw.github.com/pravins/lohit/master/devanagari/test-devanagari.txt

1. https://www.redhat.com/archives/lohit-devel-list/2013-November/msg00000.html

Thursday, 7 November 2013

South Indian Script - Kannada

Hi , as this is the next script in the series for south Asian scripts, it follows the same agenda as that of Devanagari . The Kannada script shares many features common to other Indic scripts but of course since this script is used to write the Kannada language, the scripts has different shapes of letters & different behaviour of conjunct consonants.

>>Principles of Kannada Script :

Like Devanagari & related scripts, the Kannada script employs a halant, which is also known as virama or vowel omission sign, U+0CCD ್ KANNADA SIGN VIRAMA. It has same purpose in forming dead consonants.

a. Vowel Letters :

Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. The script should be written so as the single code point should be used to represent in text, instead of writing the sequence of code points.

b. Consonant Conjuncts :

In Kannada, conjunct formation tends to be graphically regular using the following pattern :
  • The first consonant of the cluster is rendered with implicit vowel or different dependant vowel appearing as the terminal element of the cluster.
  • The remaining consonants appear in conjunct consonant glyph forms in phonetic order. They are generally depicted below or to the lower right of first consonant.

c. Special characters :

The Kannada two-part vowels actually consists of a nonspacing element above the consonant letter & one or more spacing elements to the right of the consonant letter. These two-length marks have no independent existence in Kannada writing system & do not play any part as independant codes in traditional collation order .

d. Kannada Letter LLLA :

U+0CDE KANNADA LETTER FA  ia actually an obsolete Kannada letter that is transliterated in Dravidian Scholarship as z, l or r. This form should have been named "LLLA" rather than "FA". so the name in the standard is simply a mistake. Collations should treat U+0CDE as following U+0CB3 KANNADA LETTER LLA.

>>Rendering Kannada :

The Kannada script employs CV order as it is employed by ISCII std& corresponds to the phonetic & keying order of textual data.
"Unlike Devanagari & some other Indian scripts, all of the dependant vowels in Kannada are depicted to the right of their consonant letters. Hence there is no need to reorder the elements in mapping from the logical store to presentation glyph rendering & vice versa."

a) Explicit virama (Halant) has same significance as in case of Devanagari

b) Consonant cluster involving RA :

ra ರ  + halant ್  +  ka ಕ  ->  rka ರ್ಕ

ra ರ + zwj  ‍+ halant  ್  + ka ಕ  -> rka ರ‍್ಕ

ka ಕ + halant ್  + ra ರ  -> kra ಕ್ರ
c) Modifier Mark  Rules :

The Nukta is represented by a double-dot mark, U0CBC KANNADA SIGN NUKTA (..) . Two such modified consonants are used in Kannada Language : One representing the syllable za & one representing the syllable fa.

d) Avagraha Sign :

A spacing mark called U+0CBD KANNADA SIGN AVAGRAHA is used when rendering Sanskrit texts.

e) Punctuation : same as that of devanagari [1].

1. http://snehakore.blogspot.in/2013/11/south-asian-scripts-i-devanagari.html

Wednesday, 6 November 2013

South Asian scripts - Gujarati

The Gujarati script is a north indian script closely relate to devanagari. It is most obviously distinguished from devanagari by not having a horizontal bar for its letterforms, the characteristics of the older Kaithi script to which the Gujarati is related. The Gujarati script is used to write the gujarati language of the gujarati state in India.

Vowel Letters :

vowel letters are encoded atomically in unicode, even if they can be analyzed visually as consisting of multiple parts. Following chart shows some of the the letters that can be analyzed, the single codepoint that should be used to represent them in text, and the sequence of codepoints resulting from analysis that should not be used .

For  Use   Do Not Use

આ    0A86   <OA85, OABE>
એ     0A8F   <0A85, 0AC7>
ઑ    0A91   <0A85, 0AC9>

Rendering Behavior :

For rendering of the Gujarati script, see the rules for rendering as specified in Devanagari [1]. like other Brahmic scripts in the Unicode Standard, Gujarati uses the virama to form conjunct characters. The virama is informally called khodo, which means "lame" in Gujarati. Many conjunct characters, as in devanagari, loose the vertical stroke ; there are also vertical conjuncts. U+0AB0 GUJARATI LETTER RA takes special forms when it combines with other consonants e.g. :

ક + ્ + ષ -> ક્ષ (ksa) 
ર + ્ + ક -> ર્ક (rka)
ક + ્ + ર -> ક્ર (kra)
ટ + ્ + ટ -> ટ્ટ (tta)

Punctuation :

Words in Gujarati are separated by spaces. danda & double danda marks as well as some other unified punctuation used with gujarati are found in the devanagari block.

This is about the unicode core specification information about the script Gujarati. as conformance to this , we have developed & written the OT Spec rules for Gujarati sucessfully with its alpha release. Also the Gujarati testing phase is already done with harfbuzz , windows xp ,windows 7, windows 8 . The bugs that we came across are reported & fixed on [2]. very soon we are  planning to have its beta release since some issues are yet to get fixed .

One can download the ttf file from [3].
1. http://snehakore.blogspot.in/2013/11/south-asian-scripts-i-devanagari.html
2. https://github.com/pravins/lohit2/issues
3. http://skore.fedorapeople.org/test/Lohit-Gujarati.ttf

Tuesday, 5 November 2013

South Asian Scripts I - Devanagari

Hi all . As am currently going through the book " Unicode Standard Version 6.1 Core Specification - Chapter 9", the upcoming blogs can be truely informative for the South Asian scripts namely Devanagari, Gujarati, Kannada & Malayalam. It may come in series format just like the present one is specifically for devanagari.

  • Background Information : 
As most of us know, the unicode standard provide programmers with a single universal character encoding & a vast amount of data about how characters functions.But as we are dealing with more complex scripts we have another standard to follow Indian Standard Code for Information Interchange (ISCII). Most of scripts of South Asia are derived from ancient Brahmi scripts and therefore share many structural characteristics. Implementation should ensure that adequate attention is given to the actual behaviour of those scripts.

  • About Devanagari :

  1. Standards :
The Devanagari block of unicode Std is based on ISCII-1988.

   2.  Encoding Principles :

The writing systems constitute cross between syllabic & alphabetic writing systems. The effective unit of these writing systems is the orthographic syllable, consisting of consonant & vowel (CV) core & optionally with a canonical structure of (((C)C)C)V.

    3.  Rendering Devanagari :

>>Rules For Rendering :

When nominal consonant preceeds a VIRAMA, it is considered to be a dead consonant . A consonant that does not precede VIRAMA is considered to be a live consonant .
                       TAn + VIRAMAn -> TAd
                       त      + ्              -> त्

If ra+virama precedes a consonant , then it is replaced by superscript nonspacing mark "repha".
                       RAd + KAl -> KAl + RAsup
                       र्      +  क   ->  क    + र्          -> र्क

If the "repha" is to be applied to a dead consonant & that dead consonant is combined with another consonant to form a conjunct , then the mark will be applied to the conjunct ligature form as a whole .

                       RAd + JAd+ NYAn -> J.NYAn +RAsup
                       र्      + ज्    + ञ        ->  ज्ञ         +र्        -> र्ज्ञ
If the "repha" is to be applied to a dead consonant that is subsequently replaced by its half-consonant form, then the mark will get applied to the base of consonant cluster.
                       RAd + GAd + GHAl -> GAh + GHAl + RAsup
                       र्      +  ग्      +  घ       ->  ग्     +  घ         +   र्        -> र्ग्घ

In conformance with ISCII std , the half-consonat form is represented as eyelash-RA . This form of RA is commonly used in writing Marathi .
                      RRAn + VIRAMAn + YAn -> RRAh
                      ऱ         + ्              +  य      ->  ऱ्य

                      RAd + ZWJ + YAn-> RAh
                      र्      +  ‍       +  य    ->  ऱ्य
Except for dead consonant RA, when a dead consonant precedes the live consonant RA,then dead consonant is replaced with its nominal form, and RA is replaced by subscript RAsub, which applies to nominal form.
                       TTHAd + RAl -> TTHAn + RAsub
                        ठ्          + र     ->  ठ          + ्र          -> ठ्र

For certain consonants, the mark RAsub may graphically combine with the consonant to form a conjuncts.
                       PHAd + RAl -> PHAn + RAsub
                       फ्        + र     ->  फ          + ्र          -> फ्र

If a dead consonant (other than RAd) precedes RAd, then the substitution of RA for RAsub is performed ; however, the VIRAMA that formed RAd remains to form a dead consonant conjuct form.
                       TAd + RAd -> TAn + RAsub + VIRAMAn -> T.RAd
                        त्     + र      ->  त     + ्र         + ्              -> त्र ्

The nukta sign, which modifies a consonant  form, is attached to that consonant in rendering. If the consonant represents a dead consonant, then NUKTA should precede VIRAMA .
                       KAn + NUKTAn + VIRAMAn -> QAd
                       क     + ़             + ्              ->  क़्         

Other Modifying marks , in particular bindus , apply to the orthographic syllable as a whole. The bindus should follow any vowel signs. The relative placement of these marks is horizontal rather than vertical; the horizontal rendering order may vary according to typographic concerns.
                        KAn + AAvs + CANDRABINDUn
                        क     + ा      + ँ               ->  काँ      

If a dead consonant immediately precedes another dead consonant or a live consonant, then the first dead consonant may join the subsequent element to form a two-part conjunct.
                        JAd + NYAl ->  J.NYAn
                        ज्    +  ञ       ->  ज्ञ          

                        TTAd + TTHAl -> TT.TTHAn
                         ट्       + ठ          ->  ट्ठ        

A conjunct ligature form can itself behave as a dead consonant & enter into further, more complex ligatures. A conjunct ligature form can also produce a half-form.
                       SAd + TAd + RAn -> SAd + T.RAn -> S.T.RAn
                        स्    +  त्     + र      ->  स्     +  त्र         -> स्त्र

If a nominal consonant or conjunct ligature form precedes RAsub as a result of the application of rule R6, then the consonant or ligature form may join with RAsub to form a multi-part conjunct ligature.
                        KAn + RAsub -> K.RAn
                        क     + ्र          -> क्र        

In some cases, other combining marks will combine with a base consonant, either attaching at a nonstandard location or changing shape. In minimal rendering there are only two cases : RA live with Uvowelsign or UUvowelsign.
                       RAl + Uvs -> RUn
                        र    +  ु    -> रु         

When the dependant vowel Ivs is used to override the inherent vowel of a syllable, it is always written to the extreme left of the orthographic syllable.
                       TAd + RAl + lvs -> T.RAn + lvs -> lvs + T.RAd
                        त्     + र     + ि  ->  त्र        +  ि  -> त्रि

The presence of an explicit virama blocks this reordering, and the dependant vowel is rendered after the rightmost such explicit virama.
                       TAd + ZWNJ + RAl + lvs -> TAd + lvs + RAl
                       त्     + ‌           + र     + ि  ->  त् रि

These sixteen rules for rendering of devanagari strengthen devanagari script . & also i would like to mention the currently progressing lohit devanagari on github [1] supports all of these rules & rendering principles as well .

1. https://github.com/pravins/lohit/tree/master/devanagari