In some cases you can have whitespace in Thai text that isn't semantically significant. For example, in printed Thai maiyamok (ๆ) sometimes has a little whitespace before it and sometimes does not (it seems to be a matter of the house style). Should that whitespace be encoded as a space character, or should it be added automatically by the display/formatting process according to the user's specified stylistic preference?
> In some cases you can have whitespace in Thai text that isn't semantically > significant. For example, in printed Thai maiyamok (ๆ) sometimes has a > little whitespace before it and sometimes does not (it seems to be a matter > of the house style). Should that whitespace be encoded as a space > character, or should it be added automatically by the display/formatting > process according to the user's specified stylistic preference?
and in input sequence correction will do this ?
something + space + maiyamok --> something + maiyamok
> In some cases you can have whitespace in Thai text that isn't semantically > significant. For example, in printed Thai maiyamok (ๆ) sometimes has a > little whitespace before it and sometimes does not (it seems to be a matter > of the house style). Should that whitespace be encoded as a space > character, or should it be added automatically by the display/formatting > process according to the user's specified stylistic preference?
> In some cases you can have whitespace in Thai text that isn't semantically > significant. For example, in printed Thai maiyamok (ๆ) sometimes has a > little whitespace before it and sometimes does not (it seems to be a matter > of the house style). Should that whitespace be encoded as a space > character, or should it be added automatically by the display/formatting > process according to the user's specified stylistic preference?
My opinion is that the space before MAIYAMOK should be encoded and rendered as is, and the Unicode line breaking should prohibit line break before and after it, to prevent orphaned MAIYAMOK in the new line.
In fact, MAIYAMOK should prohibit break opportunity before it, direct or indirect. This allows both spacing schools to get the same result.
And finally, I think a guideline should be made to suggest the good style of spacing before MAIYAMOK. I learned that the Royal Institute says a leading space should be used before MAIYAMOK. We can rely on that. But it's up to user whether to follow the guideline or not.
The rationale for spacing before and after MAIYAMOK is that it's a distortion from Thai digit two (๒). And traditional writings put spaces before and after it for that reason. The no-space practice was said to start in computer use, when most software couldn't handle space before MAIYAMOK properly when breaking lines. So, removing the leading space has become a common workaround.
> > In some cases you can have whitespace in Thai text that isn't semantically > > significant. For example, in printed Thai maiyamok (ๆ) sometimes has a > > little whitespace before it and sometimes does not (it seems to be a matter
For the little whitespace, I think it is some font (FreesiaUPC) that has a little gap at the front of a single maiyamok character.
> > of the house style). Should that whitespace be encoded as a space > > character, or should it be added automatically by the display/formatting > > process according to the user's specified stylistic preference?
If we decide to do this, then we should also do this with the digit number? For example,
ปากกา๒ด้าม --> ปากกา ๒ ด้าม
But, fortunately, people normally don't write "ปากกา๒ด้าม", so this is not much concern as Maiyamok does.
> My opinion is that the space before MAIYAMOK should be > encoded and rendered as is, and the Unicode line breaking > should prohibit line break before and after it, to prevent > orphaned MAIYAMOK in the new line.
If unicode prohibit line breaking before Maiyamok, then it should do the same with digit?
ปากกา ๒ ด้าม --> ปากกา<nbsp>๒<nbsp>ด้าม
Should unicode concern this typographic issue?
> In fact, MAIYAMOK should prohibit break opportunity before it, > direct or indirect. This allows both spacing schools to get the > same result.
> And finally, I think a guideline should be made to suggest the > good style of spacing before MAIYAMOK. I learned that the > Royal Institute says a leading space should be used before > MAIYAMOK. We can rely on that. But it's up to user whether to > follow the guideline or not.
> The rationale for spacing before and after MAIYAMOK is that > it's a distortion from Thai digit two (๒). And traditional
Eventough it's a distortion of ๒, but they serve different function and are in different category.
However, this point concerns with nothing other than the user's style or preference.
ขออนุญาตเขียนภาษาไทยนะครับ. [Let me continue in Thai] เหตุผลของ ราชบัณฑิตฯ ไม่ได้แสดงให้เห็นถึงประโยชน์หรือข้อดีข้อเสีย เพื่อมาสนับสนุนการใส่หรือไม่ใส่ space เลย. [Rationale from Royal Institute doesn't show any pros/cron to support the insertion of spacing] ถ้าเกิดว่ามีเหตุผลอื่นที่ชี้ให้เห็นประโยชน์มากกว่า, ก็สมควรรับฟังเหตุผลนั้นมากกว่า ใช่หรือไม่? [If it has any other rationale that show more advantage, then we should accept that rationale than this rationale, shouldn't we?]
ผมจะลองยกเหตุผลเรื่องการจัดกลุ่มตัวอักษรตามหน้าที่หรืออะไรก็แล้วแต่. [I will try to raise another rationale based on characters category grouped by their function or whatever] ไม้ยมกสามารถจัดให้อยู่ในกลุ่มที่ใกล้เคียงกับไปยาลน้อยได้เช่นเดียวกัน. (ทำไม ไม้ยมก ถึงใกล้เคียงกับ ไปยาลน้อย นั้น, มีคำอธิบายในย่อหน้าล่าง.) [Maiyamok can also be categorized into the same group as Paiyannoi (ฯ)] [(The reason why Maiyamok is alike to Paiyannoi will be described in the next paragraph)] ถ้าว่าตามเหตุผลนี้แล้ว ไม้ยมก ก็ไม่ควรมี space แบบเดียวกับที่ ไปยาลน้อย ไม่มี space, [By this rationale, Maiyamok should also doesn't have leading space, as Paiyannoi does not]
กรุงเทพฯ มากๆ
อย่างไรก็ตาม การยกเหตุผลนี้ ก็ไม่ได้แสดงให้เหตุถึงประโยชน์ใดๆ มากไปกว่าเหตุผลของ ราชบัณฑิตฯ อยู่ดี. [However, this rationale doesn't demonstrate any more pros/cron than the Royal Institute does] ก็เป็นเพียง ความชอบส่วนบุคคล ที่บางคนอาจจะมอง ไม้ยมก ในกลุ่มเดียวกับ ไปยาลน้อย. [It it just a personal preference, that someone may see Maiyamok alike to Paiyannoi] เป็นการจัดกลุ่ม ในลักษณะการมอง context รอบข้าง (โดยไม่คำนึงถึงความหมาย), กล่าวคือ ไม้ยมก จะใช้ต่อท้ายและผูกสัมพันธ์เข้ากับคำหรือวลีที่อยู่ข้างหน้ามันเสมอ. และไม่เคยมีการสัมพันธ์ใดๆ กับคำข้างท้ายมันเลย. เช่นเดียวกับ ไปยาลน้อย ("ฯพณฯ" เป็นกรณียกเว้นเพียงกรณีเดียว. หรือจะนับ ไปยาลใหญ่ "ฯลฯ" ด้วย?). [They are categorized in regard to their surrounding context (regardless of their meanings), in that, Maiyamok will always be used to append to a word/phrase and have a close relation to that leading word/phrase. It never have the relation to the word/phrase following it. Paiyannoi also have this same characteristic (The first Paiyannoi in "ฯพณฯ" is the only one exception. Should we count PaiyanYai "ฯลฯ"?)] ในขณะที่ ตัวเลข (digit) ไม่ได้ผูกสัมพันธ์กับคำข้างหน้าเสมอไป, บางครั้งก็ผูกเข้ากับคำข้างหลัง. [While the digit, sometimes, does not have a close relation to the leading word/phrase. Moreover, it often have the relation to the word/phrase following it]
จะเห็นได้ว่า ถึงแม้ว่าจะเปลี่ยนรูปมาจากเลข ๒ แต่ว่า ลักษณะการใช้งานจะต่างจากเลข ๒. [This show that, eventhough it is a distortion from ๒, but its usage characteristic is different from ๒] โดยที่ลักษณะการใช้งานจะใกล้เคียงกับ ไปยาลน้อย มากกว่า, ก็น่าจะใช้รูปแบบการเขียนแบบเดียวกับ ไปยาลน้อย หรือไม่? [The usage characteristic of Maiyamok is alike to Paiyannoi than to ๒, then we should use the same writing style as Paiyannoi?]
ที่ยกมาข้างบนนี้เป็นเพียงตัวอย่างหนึ่ง ของเหตุผล ที่อาจจะยังชี้ขาดไม่ได้ชัดเจนนัก. [The above rationale is just one example. It may not be an outstanding rationale]
คราวนี้จะลองยกเหตุผลที่ว่า ถ้าคนส่วนมากนิยมเขียนติดกันอยู่แล้ว ถ้างั้นก็กำหนดให้ใช้ตามความนิยมของคนส่วนมากไปเลย. (นี่เป็นตัวอย่างของการกำหนดมาตรฐานจากผู้ใช้ขึ้นไป, แทนที่จะกำหนดจากเบื้องบนลงมา) [Let me raise another rationale. If most people prefer to write Maiyamok with no leading space, then let's define the standard according to the popularity of that style] [(This is an example of defining standard upwarding from real users, rather than from the top people)] ตามเหตุผลข้อนี้จะเห็นประโยชน์ชัดๆ คือ คนจำนวนมากสามารถเขียนตามความเคยชินของตนเองต่อไปได้ โดยไม่ต้องปรับตัวเลย. [By this rationale, it has outstanding advantage, in that, most people can continue to write in their preference, by not having to adjust themselves at all]
แต่ไม่ได้หมายความว่า เรื่องอื่นๆ จะยึดเอาตามความเคยชินมาเป็นเหตุผลเสมอไปครับ. [But this doesn't mean that the other issues will always be set by the popularity] ที่กรณีนี้สามารถทำได้, เพราะ เป็นกรณีที่ยังไม่สามารถหาเหตุผลอื่นใดที่มีสาระประโยชน์มากกว่า, มาแย้งเหตุผลด้านความเคยชินได้. [The reason why this Maiyamok issue can be set by popularity, is that, there's still no any other rationale, that have more advantage than the popularity's benefit]
to my knowledge, in Thai (language) writing system, a general rule of thumb is that, symbols should be written separated from alphabets by a space, before and after.
the exception goes to Paiyannoi, as it is actually part/reduced form of a word. "กรุงเทพฯ" is for "กรุงเทพมหานคร", one word.
if we put it "กรุงเทพ ฯ", it will be "กรุงเทพ มหานคร", two words.
On 5/7/08, Arthit Suriyawongkul <art...@gmail.com> wrote:
> to my knowledge, in Thai (language) writing system, > a general rule of thumb is that, symbols should be written > separated from alphabets by a space, before and after.
> the exception goes to Paiyannoi, as it is actually part/reduced form of a word. > "กรุงเทพฯ" is for "กรุงเทพมหานคร", one word.
> if we put it "กรุงเทพ ฯ", it will be "กรุงเทพ มหานคร", two words.
> On 5/7/08, Arthit Suriyawongkul <art...@gmail.com> wrote: > > to my knowledge, in Thai (language) writing system, > > a general rule of thumb is that, symbols should be written > > separated from alphabets by a space, before and after.
> > the exception goes to Paiyannoi, as it is actually part/reduced form of a word. > > "กรุงเทพฯ" is for "กรุงเทพมหานคร", one word.
> > if we put it "กรุงเทพ ฯ", it will be "กรุงเทพ มหานคร", two words.
> On 4/30/08, Theppitak Karoonboonyanan <t...@linux.thai.net> wrote: >> My opinion is that the space before MAIYAMOK should be >> encoded and rendered as is, and the Unicode line breaking >> should prohibit line break before and after it, to prevent >> orphaned MAIYAMOK in the new line.
> If unicode prohibit line breaking before Maiyamok, then it should do > the same with digit?
> ปากกา ๒ ด้าม --> ปากกา<nbsp>๒<nbsp>ด้าม
> Should unicode concern this typographic issue?
Line break before and after digits are still valid in some cases, while line break before Maiyamok is considered a bad style. So, it's reasonable for the line breaking algorithm to handle the case.
>> In fact, MAIYAMOK should prohibit break opportunity before it, >> direct or indirect.
Just want to retain this quote, for reference, as it's an important part of the discussion above.
>> And finally, I think a guideline should be made to suggest the >> good style of spacing before MAIYAMOK. I learned that the >> Royal Institute says a leading space should be used before >> MAIYAMOK. We can rely on that. But it's up to user whether to >> follow the guideline or not.
>> The rationale for spacing before and after MAIYAMOK is that >> it's a distortion from Thai digit two (๒). And traditional
> Eventough it's a distortion of ๒, but they serve different function > and are in different category.
> However, this point concerns with nothing other than the user's style > or preference.
Your quoting has dropped some information in my words. I said:
>> The rationale for spacing before and after MAIYAMOK is that >> it's a distortion from Thai digit two (๒). And traditional >> writings put spaces before and after it for that reason.
That is, it explains why Maiyamok is written like that in traditional writings (see some lead-printed books before the computer age for examples), before being blurred out by computer-typeset books, due to software limitation. (That may explain people's different behaviors, as they learn from the books.)
So, the point is that the principle used to exist, before being blurred out by computer software limitation. Now that the chance has come to remove such limitation, why not getting to the situation before the problem was caused?
Regarding the functionality reasoning, I think the actual writing is not necessarily linked to its function. It's rather up to the symbol's own characteristic. For example, Yamakkan and Phinthu share the same function for marking consonant clusters, mostly for Sanskrit transcription, like พ๎ราห๎มณ and พฺราหฺมณ. There is nothing to rule that it must always be written above or below base line.
And, in particular case of Paiyannoi, it shares the same shape as Angkhan Diao, which is used for ending sentences or poem stanzas. If written with leading space, it would be likely read as sentence/stanza ending.
>> On 4/30/08, Theppitak Karoonboonyanan <t...@linux.thai.net> wrote:
>>> My opinion is that the space before MAIYAMOK should be >>> encoded and rendered as is, and the Unicode line breaking >>> should prohibit line break before and after it, to prevent >>> orphaned MAIYAMOK in the new line. > Line break before and after digits are still valid in some > cases, while line break before Maiyamok is considered a > bad style. So, it's reasonable for the line breaking algorithm > to handle the case.
+1 for prohibit line break before MAIYAMOK But prohibit line break after MAIYAMOK?
It is impossible to pack "น่า" in the first line due to limited space. If we allow line break after MAIYAMOK, the result will be 1). But if we doesn't allow line break after MAIYAMOK, we have to pull one more word before MAIYAMOK to maintain the rule as in 2).
Of course 1) is better. The point is whether 1) is acceptable.
>>> In fact, MAIYAMOK should prohibit break opportunity before it, >>> direct or indirect.
> Just want to retain this quote, for reference, as it's an > important part of the discussion above >>> And finally, I think a guideline should be made to suggest the >>> good style of spacing before MAIYAMOK. I learned that the >>> Royal Institute says a leading space should be used before >>> MAIYAMOK. We can rely on that. But it's up to user whether to >>> follow the guideline or not.
>>> The rationale for spacing before and after MAIYAMOK is that >>> it's a distortion from Thai digit two (๒). And traditional
>> Eventough it's a distortion of ๒, but they serve different function >> and are in different category.
>> However, this point concerns with nothing other than the user's style >> or preference.
I want to raise an important point here. IMO, WTT 3.0 should work with any Thai text, good or bad styled. There should be some clarification from RI or the publishing industry on good style. But people input text as they like, so default WTT 3.0 behavior should be good enough to handle real-world cases, e.g. never assume that there will be a space before/after MAIYAMOK or not.
One example is "ดร.สมชาย" v.s. "ดร. สมชาย". The latter is right, right? But if one day, WTT 3.0 must decide on handle this form, we have to expect both.
However, assuming the presence of space before MAIYAMOK has one implication in WTT 3.0. If we agree that should be a space *character* before MAIYAMOK, then there shouldn't be any special space in front of MAIYAMOK glyph (as James notices). And that would result in the recommendation in Thai font guideline (if in scope of WTT 3).
-- _/|\_ Samphan Raruenrom. Open Source Development Co., Ltd. Tel: +66 38 311816, Fax: +66 38 773128, http://www.osdev.co.th/
On May 7, 10:19 pm, "Anon Sricharoenchai" <anon....@gmail.com> wrote:
> On 5/7/08, Arthit Suriyawongkul <art...@gmail.com> wrote:
> > to my knowledge, in Thai (language) writing system,
> > a general rule of thumb is that, symbols should be written
> > separated from alphabets by a space, before and after.
> > the exception goes to Paiyannoi, as it is actually part/reduced form of a word.
> > "กรุงเทพฯ" is for "กรุงเทพมหานคร", one word.
> > if we put it "กรุงเทพ ฯ", it will be "กรุงเทพ มหานคร", two words.
> +1 for prohibit line break before MAIYAMOK > But prohibit line break after MAIYAMOK?
No, as it's fully reasonable to have Maiyamok at the end of line.
> I want to raise an important point here. IMO, WTT 3.0 should work with > any Thai text, good or bad styled. There should be some clarification > from RI or the publishing industry on good style. But people input text > as they like, so default WTT 3.0 behavior should be good enough to > handle real-world cases, e.g. never assume that there will be a space > before/after MAIYAMOK or not.
Absolutely. My point is that the leading space is a "guideline", not a rule like the canonical order.
> However, assuming the presence of space before MAIYAMOK has > one implication in WTT 3.0. If we agree that should be a space > *character* before MAIYAMOK, then there shouldn't be any special > space in front of MAIYAMOK glyph (as James notices). And that > would result in the recommendation in Thai font guideline (if in > scope of WTT 3).
Right. Provided that software handles it well, there needs not be leading space in Maiyamok glyph any more.
Theppitak Karoonboonyanan wrote: >> I want to raise an important point here. IMO, WTT 3.0 should work with >> any Thai text, good or bad styled. There should be some clarification >> from RI or the publishing industry on good style. But people input text >> as they like, so default WTT 3.0 behavior should be good enough to >> handle real-world cases, e.g. never assume that there will be a space >> before/after MAIYAMOK or not.
> Absolutely. My point is that the leading space is a "guideline", not a > rule like the canonical order.
I understand. The point I really want to make is about the rationale for things in WTT 3.0. For example, there's another solution for this "one space before MAIYAMOK", i.e. put it in the MAIYAMOK glyph. If we decide to do this, we'll *recommend* people not to add space *character* before MAIYAMOK, then we may decide not to modify the line-break property of MAIYAMOK because we already recommend people to do-the-right-thing.
My point is that, WTT 3.0 must not make any assumption at all when processing Thai text, even though we do have our own assumption or even provide some guidelines.
For example, in this case, no matter there should be a space character before MAIYAMOK or not, we should correct the line-break property of MAIYAMOK anyway.
Moreover, even there're somethings wrong with the text, WTT 3.0 behaviors should always be reasonable. We may encounter such issue, e.g. when consider grapheme clusters in selection and editing.
-- _/|\_ Samphan Raruenrom. Open Source Development Co., Ltd. Tel: +66 38 311816, Fax: +66 38 773128, http://www.osdev.co.th/
> I understand. The point I really want to make is about the rationale for > things in WTT 3.0. For example, there's another solution for this > "one space before MAIYAMOK", i.e. put it in the MAIYAMOK glyph. > If we decide to do this, we'll *recommend* people not to add space > *character* before MAIYAMOK, then we may decide not to modify > the line-break property of MAIYAMOK because we already recommend > people to do-the-right-thing.
The guideline would include glyph design for Maiyamok as well.
> My point is that, WTT 3.0 must not make any assumption at all when > processing Thai text, even though we do have our own assumption > or even provide some guidelines.
> For example, in this case, no matter there should be a space character > before MAIYAMOK or not, we should correct the line-break property of > MAIYAMOK anyway.
Right, as I already stated in some previous post:
In fact, MAIYAMOK should prohibit break opportunity before it, direct or indirect. This allows both spacing schools to get the same result.
(For people who may feel unclear, prohibiting *indirect* break opportunity means the prohibition propagates through the leading whitespaces as well.)
> Moreover, even there're somethings wrong with the text, WTT 3.0 behaviors > should always be reasonable. We may encounter such issue, e.g. when > consider grapheme clusters in selection and editing.
Let's come to it in that thread. I can't imagine an example yet.
> > On 5/7/08, Arthit Suriyawongkul <art...@gmail.com> wrote: > > > to my knowledge, in Thai (language) writing system, > > > a general rule of thumb is that, symbols should be written > > > separated from alphabets by a space, before and after.
What is the definition of symbol? ๑-๙ is a symbol or not? The bracket "(", ")", comma "," and dot "." is symbol or not?
> > > the exception goes to Paiyannoi, as it is actually part/reduced form of a word. > > > "กรุงเทพฯ" is for "กรุงเทพมหานคร", one word.
> > > if we put it "กรุงเทพ ฯ", it will be "กรุงเทพ มหานคร", two words.
> > มาก ๆ --> มาก มาก ?
> yep. two repeated words.
> ไปเลย ๆ --> ไปเลย ไปเลย
Relating this logic, we should also write, "เรา ควร จะ เขียน อย่างนี้ หรือ เปล่า?" :D