Unicode in C

Unicode in C

Elazar Leibovich elazarl at gmail.com
Tue Mar 13 14:08:36 IST 2012


On Tue, Mar 13, 2012 at 1:19 PM, Meir Kriheli <mkriheli at gmail.com> wrote:

>
> Nitpick: It's actually ucs2/ucs4 (which preceded the above but are
> compatible).
>

Double nitpick, UTF-16 and UCS-2 are identical representation, and it's
better to always use the name UTF-16 as the FAQ
says<http://www.unicode.org/faq/basic_q.html#14>
:

UCS-2 is obsolete terminology which refers to a Unicode implementation up
> to Unicode 1.1, before surrogate code points and UTF-16 were added to
> Version 2.0 of the standard. *This term should now be avoided.*


So I think it's perfectly reasonable to call the internal representation
UTF-16.
(And since python offer some support for surrogate pairs, at least in
string literals, it might even make sense to call it UTF-16).

(Sorry, I couldn't help it ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20120313/83d6485d/attachment.html>


More information about the Linux-il mailing list