Unicode in C

Mon Mar 12 15:05:56 IST 2012

Hi, I have a question that I was sort of sad that I couldn't readily
find the answer to...

Let's say I want to create a C API (a C library), with functions which
take strings as arguments. What am I supposed to use if I want these strings
to be in any language? Obviously the answer is "Unicode", but that
doesn't really answer the question... How is Unicode used in C?

As far as I can see, there are two major approaches to this problem.

One approach, used in the Win32 C APIs on MS-Windows, and also in Java and
other languages, is to use "wide characters" - characters of 16 or 32 bit
size, and strings are an array of such characters.

The second approach, proposed by Plan 9, is to use UTF-8.

I personally like better the UTF-8 approach, because it naturally fits
with C's "char *" type and with Linux's system calls (which take char*,
not any sort of wide characters), but I'm completely unsure that this is
what users actually want. If not, then I wonder, why?

Some background on this question: People have been complaining for years
that Hspell, and in particular the libhspell functions, use ISO-8859-8
instead of "unicode". But if one wants to add unicode to libhspell, what
should it be? UTF-8? Wide chars (UTF-16 or UTF-32)?

Thanks,
Nadav.

-- 
Nadav Har'El                        |                    Monday, Mar 12 2012, 
nyh at math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |We could wipe out world hunger if we knew
http://nadav.harel.org.il           |how to make AOL's Free CD's edible!