Unicode in C

Unicode in C

Elazar Leibovich elazarl at gmail.com
Tue Mar 13 21:32:58 IST 2012


Something very important, one need to consider is Unicode normalization.
That is, how to strip out the Niqud, and to substitute, say KAF WITH DAGESH
(U+FB3B) with just a KAF (U+05DB) etc.

I guess that you're doing that already to some degree in hspell, so (in
case you're translating to ISO-8859-8) you just have to be careful not to
miss any letters in the conversion from Unicode.

On Mon, Mar 12, 2012 at 3:05 PM, Nadav Har'El <nyh at math.technion.ac.il>wrote:

> Hi, I have a question that I was sort of sad that I couldn't readily
> find the answer to...
>
> Let's say I want to create a C API (a C library), with functions which
> take strings as arguments. What am I supposed to use if I want these
> strings
> to be in any language? Obviously the answer is "Unicode", but that
> doesn't really answer the question... How is Unicode used in C?
>
> As far as I can see, there are two major approaches to this problem.
>
> One approach, used in the Win32 C APIs on MS-Windows, and also in Java and
> other languages, is to use "wide characters" - characters of 16 or 32 bit
> size, and strings are an array of such characters.
>
> The second approach, proposed by Plan 9, is to use UTF-8.
>
> I personally like better the UTF-8 approach, because it naturally fits
> with C's "char *" type and with Linux's system calls (which take char*,
> not any sort of wide characters), but I'm completely unsure that this is
> what users actually want. If not, then I wonder, why?
>
> Some background on this question: People have been complaining for years
> that Hspell, and in particular the libhspell functions, use ISO-8859-8
> instead of "unicode". But if one wants to add unicode to libhspell, what
> should it be? UTF-8? Wide chars (UTF-16 or UTF-32)?
>
> Thanks,
> Nadav.
>
> --
> Nadav Har'El                        |                    Monday, Mar 12
> 2012,
> nyh at math.technion.ac.il
> |-----------------------------------------
> Phone +972-523-790466, ICQ 13349191 |We could wipe out world hunger if we
> knew
> http://nadav.harel.org.il           |how to make AOL's Free CD's edible!
>
> _______________________________________________
> Linux-il mailing list
> Linux-il at cs.huji.ac.il
> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20120313/6615389d/attachment.html>


More information about the Linux-il mailing list