Unicode in C

Unicode in C

Elazar Leibovich elazarl at gmail.com
Mon Mar 12 22:06:36 IST 2012


On Mon, Mar 12, 2012 at 7:37 PM, Nadav Har'El <nyh at math.technion.ac.il>wrote:

> On Mon, Mar 12, 2012, Elazar Leibovich wrote about "Re: Unicode in C":
> > The simplest option is, to accept StringPiece-like structure (pointer to
> > buffer + size), and encoding, then to convert the data internally to your
> > encoding (say, ISO-8859-8, replacing illegal characters with whitespace),
> > and convert the other output back.
>
> This is an option, but certainly not the simplest :-)
>

It was the simplest idea *I could think of* at this moment ;p


>
> What "iconv-like library"?
>

iconv-like means, "Do you mind using iconv from glibc, and if that's a
problem due to support in Windows, embedded systems, etc that do not
feature glibc, do you mind having a dependency on other library, such as
ICU, or at least something more lightweight like that would handle all
UTF-* conversions?"


>
> I'm not ruling this idea out. But what worries me is that at the end,
> my users only use 1% of this library's features - e.g., I'll never need
> this library's support from converting one encoding of Chinese to
> another. So people who want to use the 50 KB libhspell will suddenly need
> the 15 MB libicu.
>

At least when using iconv on linux this is not the case. First, this
library is available at every distro, and second, iconv is smart enough to
split the functionality amongst many .so files, and to dynamically load
only the required shared objects at runtime. I'm not sure what's the state
of iconv at Windows though. Maybe you can fallback there to native system
calls.

That said, on a second thought, all the single-byte encoding seems to me
more and more deprecated. Thus, I think it might be sufficient to support
only UTF-16 and UTF-8. UTF-8 is common at network and files, and UTF-16 is
common as inside format in C++ libraries Java and C#, so it's important to
support it for easier interoperability with those.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20120312/c7c8f91d/attachment.html>


More information about the Linux-il mailing list