Unicode in C
Elazar Leibovich
elazarl at gmail.com
Tue Mar 13 13:05:02 IST 2012
2012/3/13 kobi zamir <kobi.zamir at gmail.com>
>
>
>> So I guess that you're also in the UTF-8 camp.
>>
>
> yes, but my opinion about utf-8 is just my opinion. i like python and
> python defaults to utf-8.
>
Python's internal representation is not UTF-8, but UTF-16, or UTF-32,
depends on build parameters. Thus python doesn't really support code points
above the BMP.
Of course, you cannot know the internal representation, since python
(cleverly) does not allow you to cast a unicode string to a sequence of
bytes without specifying the result encoding.
http://docs.python.org/c-api/unicode.html
(see also this very good
presentation<http://98.245.80.27/tcpc/OSCON2011/gbu.html>on internal
unicode representations in various languages).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20120313/cf3ab55d/attachment.html>
More information about the Linux-il
mailing list