Sorry for a long absence. I'd like to resume the discussion by summarizing what Martin, Samphan and I have discussed in a separate chat room, for you to comment.
- Each language has its own canonical order, which is to be satisfied by the corresponding IM for that language. - A generic IM is provided as a fallback for unsupported languages, or for relaxed circumstances. - All language canonical orders define strings of subsets of the script canonical order. - The OM is defined upon script canonical order. That is, it will become generic, no more sharing of incident table with the IM as in WTT 2.0, except for low-conformance implementations.
Script Canonical Order:
- No strict classification of marks. One is free to use Mai Tri as a vowel, or Sara ii over Mai Ek, for example. - The sequence must be at least normalized, in terms of Unicode. - For more restrictions, go to language-specific canonical orders.
- Levels of conformance, to be defined, e.g. level 1: WTT 2.0 + some more limited sequences (mostly for legacy fonts like TrueType, Type 1, Bitmap) level 2: Relaxed stacking of combining marks (mostly requires advanced typography technology, such as OpenType, or special rendering engines which manage stacking by their own, such as Qt) - The OM with level 2 conformance may also provide "legacy mode" which falls back to level 1 when rendering with legacy fonts, unless they can manage stacking without OpenType features available.
Input Methods:
- Generic: incident table to be defined. - Thai: WTT 2.0 + some more limited sequences. - Minority languages: to be elaborated as needed.
> - Each language has its own canonical order, which is to be satisfied by > the corresponding IM for that language. > - A generic IM is provided as a fallback for unsupported languages, or > for relaxed circumstances. > - All language canonical orders define strings of subsets of the script > canonical order. > - The OM is defined upon script canonical order. That is, it will become > generic, no more sharing of incident table with the IM as in WTT 2.0, > except for low-conformance implementations.
We may be overstating the importance of these language specific canonical forms. When it comes to implementation, it is unlikely they will be greatly used. But it does provide a way of backwardly carrying WTT2 forward into WTT3.
> - No strict classification of marks. One is free to use Mai Tri as a > vowel, or Sara ii over Mai Ek, for example. > - The sequence must be at least normalized, in terms of Unicode. > - For more restrictions, go to language-specific canonical orders.
> - Levels of conformance, to be defined, e.g. > level 1: WTT 2.0 + some more limited sequences > (mostly for legacy fonts like TrueType, Type 1, Bitmap)
If we take a traditional Thai TrueType font with its 4 positional variants of mai ek, etc. as used since Windows 3.1, we find it can support the following sequences:
and this is sufficient for all the languages that I have information on (which is about 8).
I.e. all this talk of minority languages isn't really going to cause us much more work. The technologies we have can handle them pretty well if we just loosen things up a bit.
> level 2: Relaxed stacking of combining marks > (mostly requires advanced typography technology, such as > OpenType, or special rendering engines which manage > stacking by their own, such as Qt)
The trick here is to create rendering engines that don't put restrictions on what the OpenType font might want to do, while still doing useful work.
> - The OM with level 2 conformance may also provide "legacy mode" which > falls back to level 1 when rendering with legacy fonts, unless they > can manage stacking without OpenType features available.
> Input Methods:
> - Generic: incident table to be defined.
I would suggest that this be defined as allowing any script valid input sequence. I.e. there is no restriction on the characters you can type, just their relative order.
> - Thai: WTT 2.0 + some more limited sequences. > - Minority languages: to be elaborated as needed.
It is highly unlikely that minority language specific keyboard layouts will be developed for Thai. They will most probably use the generic keyboard.
We have talked a lot about the needs of minority languages and that is no bad thing, but I think that their needs are covered relatively easily using generic approaches. The question now is how do we bring together a generic approach and a tighter approach for the Thai language in a helpful way.
> > - Each language has its own canonical order, which is to be satisfied by > > the corresponding IM for that language. > > - A generic IM is provided as a fallback for unsupported languages, or > > for relaxed circumstances. > > - All language canonical orders define strings of subsets of the script > > canonical order. > > - The OM is defined upon script canonical order. That is, it will become > > generic, no more sharing of incident table with the IM as in WTT 2.0, > > except for low-conformance implementations.
> We may be overstating the importance of these language specific > canonical forms. When it comes to implementation, it is unlikely they > will be greatly used. But it does provide a way of backwardly carrying > WTT2 forward into WTT3.
Yes, the details for each language can be described as much as it's needed. If a language doesn't need much restriction, then just take the script canonical order as the language canonical order. (Any set is already subset of itself.) But it can always be elaborated later if triggered by sufficient needs in the future.
Thai language already has such obvious needs. So, we design such structure for all languages as well.
> > - No strict classification of marks. One is free to use Mai Tri as a > > vowel, or Sara ii over Mai Ek, for example. > > - The sequence must be at least normalized, in terms of Unicode. > > - For more restrictions, go to language-specific canonical orders.
> > - Levels of conformance, to be defined, e.g. > > level 1: WTT 2.0 + some more limited sequences > > (mostly for legacy fonts like TrueType, Type 1, Bitmap)
> If we take a traditional Thai TrueType font with its 4 positional > variants of mai ek, etc. as used since Windows 3.1, we find it can > support the following sequences:
> and this is sufficient for all the languages that I have information > on (which is about 8).
How about U+0E47 (Maitaikhu) above upper vowels in Kuy? This may be one missing case.
> I.e. all this talk of minority languages isn't really going to cause > us much more work. The technologies we have can handle them pretty > well if we just loosen things up a bit.
Yes, this involves updates in rendering engines' internal rules, according to new WTT legacy mode.
> > level 2: Relaxed stacking of combining marks > > (mostly requires advanced typography technology, such as > > OpenType, or special rendering engines which manage > > stacking by their own, such as Qt)
> The trick here is to create rendering engines that don't put > restrictions on what the OpenType font might want to do, while still > doing useful work.
> > - The OM with level 2 conformance may also provide "legacy mode" which > > falls back to level 1 when rendering with legacy fonts, unless they > > can manage stacking without OpenType features available.
> > Input Methods:
> > - Generic: incident table to be defined.
> I would suggest that this be defined as allowing any script valid > input sequence. I.e. there is no restriction on the characters you can > type, just their relative order.
Right. It will be derived from the defined canonical order.
> > - Thai: WTT 2.0 + some more limited sequences. > > - Minority languages: to be elaborated as needed.
> It is highly unlikely that minority language specific keyboard layouts > will be developed for Thai. They will most probably use the generic > keyboard.
> We have talked a lot about the needs of minority languages and that is > no bad thing, but I think that their needs are covered relatively > easily using generic approaches. The question now is how do we bring > together a generic approach and a tighter approach for the Thai > language in a helpful way.
Yes. And one thing I hope from this summary map is to ensure we're understanding the same thing as we further discuss, such as where a particular concept belongs.
If we agree on this model, then we can bring proposed issues from wikidot [1] into our working draft.
On Wed, Jun 18, 2008 at 11:07:01AM +0700, Theppitak Karoonboonyanan wrote: > On Wed, Jun 18, 2008 at 09:58:06AM +0700, Martin Hosken wrote:
> > > Output Method:
> > > - Levels of conformance, to be defined, e.g. > > > level 1: WTT 2.0 + some more limited sequences > > > (mostly for legacy fonts like TrueType, Type 1, Bitmap)
> > If we take a traditional Thai TrueType font with its 4 positional > > variants of mai ek, etc. as used since Windows 3.1, we find it can > > support the following sequences:
Note that this assumption does not apply to bitmap fonts with pure TIS-620 or ISO-10646-1 encoding. If we are to define level 1 like this, we will need, say, level 0, for such primitive fonts.
> > and this is sufficient for all the languages that I have information > > on (which is about 8).
> How about U+0E47 (Maitaikhu) above upper vowels in Kuy? This may be one > missing case.
> > I.e. all this talk of minority languages isn't really going to cause > > us much more work. The technologies we have can handle them pretty > > well if we just loosen things up a bit.
> Yes, this involves updates in rendering engines' internal rules, > according to new WTT legacy mode.
On the other hand, this may mean endorsement of such PUA glyphs by national standard body, through WTT project. I'm fine with this. We don't need to specify the PUA code points when writing the specification, for example. Just mentioning the font class with general terms should be fine.