Skip to Content.
Sympa Menu

devel - Re: [sympa-developpers] Use of UTF-8 --- Re: sympa [9919] branches/sympa-cleanup/src/sbin: [dev] use utf8 characters directly

Subject: Developers of Sympa

List archive

Chronological Thread  
  • From: IKEDA Soji <address@concealed>
  • To: "address@concealed" <address@concealed>
  • Subject: Re: [sympa-developpers] Use of UTF-8 --- Re: sympa [9919] branches/sympa-cleanup/src/sbin: [dev] use utf8 characters directly
  • Date: Mon, 25 Nov 2013 22:20:34 +0900

Guillaume and all,

2013/11/25 21:06、Guillaume Rousse <address@concealed> のメッセージ:

> Le 22/11/2013 10:44, IKEDA Soji a écrit :
>> Hi Guillaume and all,
>>
>> You replaced some characters in sources with raw UTF-8 strings.
>> Some of your changes won't cause expected results.
>>
>> (1) Earlier releases (5.8.x) of pod2man do not recognize "=encoding"
>> POD directive. In addition, some of them deny to generate
>> manpages from PODs including non-ASCII bytes.
>>
>> So POD E<...> markup should be used instead of raw UTF-8 sequences.
> OK.
>
>> (2) Most components of Sympa do not have "use utf8" pragma so that
>> UTF-8 strings in the sources will be handled as multiple bytes
>> (one of a little exceptions is Marc::Search).
>>
>> So raw UTF-8 in the sources (especially regexp) might not work
>> as expected.
> Then this pragma should be probably enforced everywhere, or nowhere at all.

Right. Marc::Search is almost one special exception.

>> I will made suggestions.
>>
>> [8053] branches/sympa-cleanup/src: [dev] conversion to utf8
>>
>> <https://address@concealed>
>>
>> - arc2webarc.pl.in:
>> bytes 0xE9 and 0xFB in regexps were replaced with multibyte
>> sequences 0xC3 0xA9 and 0xC3 0xBB. These would rather be string
>> escapes "\xE9" and "\xFB". ...(2)
>>
>> - Sympa/Message.pm:
>> - sympa_wizard.pl.in:
>> A byte 0xFC was replaced with raw UTF-8. This should rather be
>> POD markup "E<252>". ...(1)
>>
>> [9919] branches/sympa-cleanup/src/sbin: [dev] use utf8 characters directly
>> <Attached below>
>>
>> - All changes:
>> POD markup E<...> should be used. ...(1)
> You'd better fix the code directly instead of suggesting those changes.
> First, you have better knowledge on those encoding issues than anyone else
> among us. Second, I don't have any time available to work on sympa
> currently, and you don't want to be a bottleneck here.

Which branch we would modify?

In fact, part of these changes were also made by me in sympa-6.2-branch.
We did duplicated works, not only on UTF8 issue but also spelling and so on
(I scanned all sources by aspell. Probably you did, too.).

We all have to limit the target to maintain.

However, reorganization of sources is special. We saw what occurred
by sheer merge from sympa-6.2-branch into sympa-cleanup: It brought,
say, unworkable codes. That's why we wait for you taking action.

Please don't feel hurried. I (and of course all else) will wait till you
struggle against Sympa code base again.

> As a side note, tough, most of these encoding issues were caused by the
> presence of an authors list, including O. Salaün name, in some files, but
> not in others, without much apparent logic. Hence my recurrent point than
> file-based authorship doesn't make much sense, and should be centralized in
> one single place only for the whole project.

It does not affect only the names of authors e.g. 池田荘児 (my name).
Unacronymized name of Sympa is also affected. :-)

Regards,

--- Soji

> Guillaume Rousse
> INRIA, Direction des systèmes d'information
> Domaine de Voluceau
> Rocquencourt - BP 105
> 78153 Le Chesnay
> Tel: 01 39 63 58 31
>




Archive powered by MHonArc 2.6.19+.

Top of Page