Unicode in C

Tue Mar 13 07:17:14 IST 2012

enchant use hspell as is (iso-8859-8) and just convert the strings when
using the hspell lib:
http://www.abisource.com/viewvc/enchant/trunk/src/hspell/hspell_provider.c?view=markup

imho because hspell only use hebrew, it can internally continue to use
hebrew only charset without nikud iso-8859-8 (or with nikud win-1255).

it will be helpful if hspell will give the user convenience functions. this
functions will that take utf-8 and return utf-8. the functions will convert
the utf-8 to the hebrew only coding that hspell will use internally.

p.s.
i will be happy if hspell will give easy to use functions for using the
library lingual info. in current version of hspell using lingual info is
very hard. see:
http://code.google.com/p/hspell-gir/source/browse/src/hspell-gir.vala

2012/3/12 Elazar Leibovich <elazarl at gmail.com>

> On Mon, Mar 12, 2012 at 7:37 PM, Nadav Har'El <nyh at math.technion.ac.il>wrote:
>
>> On Mon, Mar 12, 2012, Elazar Leibovich wrote about "Re: Unicode in C":
>> > The simplest option is, to accept StringPiece-like structure (pointer to
>> > buffer + size), and encoding, then to convert the data internally to
>> your
>> > encoding (say, ISO-8859-8, replacing illegal characters with
>> whitespace),
>> > and convert the other output back.
>>
>> This is an option, but certainly not the simplest :-)
>>
>
> It was the simplest idea *I could think of* at this moment ;p
>
>
>>
>> What "iconv-like library"?
>>
>
> iconv-like means, "Do you mind using iconv from glibc, and if that's a
> problem due to support in Windows, embedded systems, etc that do not
> feature glibc, do you mind having a dependency on other library, such as
> ICU, or at least something more lightweight like that would handle all
> UTF-* conversions?"
>
>
>>
>> I'm not ruling this idea out. But what worries me is that at the end,
>> my users only use 1% of this library's features - e.g., I'll never need
>> this library's support from converting one encoding of Chinese to
>> another. So people who want to use the 50 KB libhspell will suddenly need
>> the 15 MB libicu.
>>
>
> At least when using iconv on linux this is not the case. First, this
> library is available at every distro, and second, iconv is smart enough to
> split the functionality amongst many .so files, and to dynamically load
> only the required shared objects at runtime. I'm not sure what's the state
> of iconv at Windows though. Maybe you can fallback there to native system
> calls.
>
> That said, on a second thought, all the single-byte encoding seems to me
> more and more deprecated. Thus, I think it might be sufficient to support
> only UTF-16 and UTF-8. UTF-8 is common at network and files, and UTF-16 is
> common as inside format in C++ libraries Java and C#, so it's important to
> support it for easier interoperability with those.
>
> _______________________________________________
> Linux-il mailing list
> Linux-il at cs.huji.ac.il
> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20120313/aac96442/attachment.html>