<div dir="ltr"><font face="arial,helvetica,sans-serif">My suggestion is go the glib/gtk approach and use utf-8 everywhere and have the API accept char*, i.e. there is no typedef for a unicode character strings. If this is not acceptable because of speed (this is its only tradeoff), then use UCS-4 internally and provide two external interfaces for UCS-4 and UTF-8. For backwards compatibility you can provide your own iso-8859-8 to utf8 conversion functions. I suggest that you don't add an iconv dependence but let the user take care of character set conversions, which you don't really care about.<br>
</font><br>Regards,<br>Dov<br><br><div class="gmail_quote">2012/3/12 Elazar Leibovich <span dir="ltr"><<a href="mailto:elazarl@gmail.com">elazarl@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">The simplest option is, to accept StringPiece-like structure (pointer to buffer + size), and encoding, then to convert the data internally to your encoding (say, ISO-8859-8, replacing illegal characters with whitespace), and convert the other output back.<div>
<br></div><div>Do you mind using iconv-like library?<div><div class="h5"><br><br><div class="gmail_quote">On Mon, Mar 12, 2012 at 3:05 PM, Nadav Har'El <span dir="ltr"><<a href="mailto:nyh@math.technion.ac.il" target="_blank">nyh@math.technion.ac.il</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi, I have a question that I was sort of sad that I couldn't readily<br>
find the answer to...<br>
<br>
Let's say I want to create a C API (a C library), with functions which<br>
take strings as arguments. What am I supposed to use if I want these strings<br>
to be in any language? Obviously the answer is "Unicode", but that<br>
doesn't really answer the question... How is Unicode used in C?<br>
<br>
As far as I can see, there are two major approaches to this problem.<br>
<br>
One approach, used in the Win32 C APIs on MS-Windows, and also in Java and<br>
other languages, is to use "wide characters" - characters of 16 or 32 bit<br>
size, and strings are an array of such characters.<br>
<br>
The second approach, proposed by Plan 9, is to use UTF-8.<br>
<br>
I personally like better the UTF-8 approach, because it naturally fits<br>
with C's "char *" type and with Linux's system calls (which take char*,<br>
not any sort of wide characters), but I'm completely unsure that this is<br>
what users actually want. If not, then I wonder, why?<br>
<br>
Some background on this question: People have been complaining for years<br>
that Hspell, and in particular the libhspell functions, use ISO-8859-8<br>
instead of "unicode". But if one wants to add unicode to libhspell, what<br>
should it be? UTF-8? Wide chars (UTF-16 or UTF-32)?<br>
<br>
Thanks,<br>
Nadav.<br>
<span><font color="#888888"><br>
--<br>
Nadav Har'El | Monday, Mar 12 <a href="tel:2012" value="+9722012" target="_blank">2012</a>,<br>
<a href="mailto:nyh@math.technion.ac.il" target="_blank">nyh@math.technion.ac.il</a> |-----------------------------------------<br>
Phone <a href="tel:%2B972-523-790466" value="+972523790466" target="_blank">+972-523-790466</a>, ICQ 13349191 |We could wipe out world hunger if we knew<br>
<a href="http://nadav.harel.org.il" target="_blank">http://nadav.harel.org.il</a> |how to make AOL's Free CD's edible!<br>
<br>
_______________________________________________<br>
Linux-il mailing list<br>
<a href="mailto:Linux-il@cs.huji.ac.il" target="_blank">Linux-il@cs.huji.ac.il</a><br>
<a href="http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il" target="_blank">http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il</a><br>
</font></span></blockquote></div><br></div></div></div></div>
<br>_______________________________________________<br>
Linux-il mailing list<br>
<a href="mailto:Linux-il@cs.huji.ac.il">Linux-il@cs.huji.ac.il</a><br>
<a href="http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il" target="_blank">http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il</a><br>
<br></blockquote></div><br></div>