Unicode in C
Nadav Har'El
nyh at math.technion.ac.il
Mon Mar 12 15:05:56 IST 2012
Hi, I have a question that I was sort of sad that I couldn't readily
find the answer to...
Let's say I want to create a C API (a C library), with functions which
take strings as arguments. What am I supposed to use if I want these strings
to be in any language? Obviously the answer is "Unicode", but that
doesn't really answer the question... How is Unicode used in C?
As far as I can see, there are two major approaches to this problem.
One approach, used in the Win32 C APIs on MS-Windows, and also in Java and
other languages, is to use "wide characters" - characters of 16 or 32 bit
size, and strings are an array of such characters.
The second approach, proposed by Plan 9, is to use UTF-8.
I personally like better the UTF-8 approach, because it naturally fits
with C's "char *" type and with Linux's system calls (which take char*,
not any sort of wide characters), but I'm completely unsure that this is
what users actually want. If not, then I wonder, why?
Some background on this question: People have been complaining for years
that Hspell, and in particular the libhspell functions, use ISO-8859-8
instead of "unicode". But if one wants to add unicode to libhspell, what
should it be? UTF-8? Wide chars (UTF-16 or UTF-32)?
Thanks,
Nadav.
--
Nadav Har'El | Monday, Mar 12 2012,
nyh at math.technion.ac.il |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |We could wipe out world hunger if we knew
http://nadav.harel.org.il |how to make AOL's Free CD's edible!
More information about the Linux-il
mailing list