<div dir="ltr">Hi,<br><br><div class="gmail_quote">2012/3/13 Elazar Leibovich <span dir="ltr"><<a href="mailto:elazarl@gmail.com">elazarl@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div class="gmail_quote"><div class="im">2012/3/13 kobi zamir <span dir="ltr"><<a href="mailto:kobi.zamir@gmail.com" target="_blank">kobi.zamir@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><br><div class="gmail_quote"><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>
<br>
</div>So I guess that you're also in the UTF-8 camp. <br></blockquote></div><div><br>yes, but my opinion about utf-8 is just my opinion. i like python and python defaults to utf-8.<br></div></div></div></blockquote><div>
<br></div></div><div>Python's internal representation is not UTF-8, but UTF-16, or UTF-32, depends on build parameters. Thus python doesn't really support code points above the BMP.</div><div>Of course, you cannot know the internal representation, since python (cleverly) does not allow you to cast a unicode string to a sequence of bytes without specifying the result encoding.</div>
<div><br></div><div><a href="http://docs.python.org/c-api/unicode.html" target="_blank">http://docs.python.org/c-api/unicode.html</a>
</div><div><br></div><div>(see also this <a href="http://98.245.80.27/tcpc/OSCON2011/gbu.html" target="_blank">very good presentation</a> on internal unicode representations in various languages).</div></div></div><br clear="all">
</blockquote></div><br>Nitpick: It's actually ucs2/ucs4 (which preceded the above but are compatible).<br><br>Actually one can know the internal representation by checking sys.maxunicode [1]. I'm using it in python-bidi to manually handle surrogate pairs if needed [2].<br>
<br>[1] <a href="http://docs.python.org/dev/library/sys.html#sys.maxunicode">http://docs.python.org/dev/library/sys.html#sys.maxunicode</a><br>[2] <a href="https://github.com/MeirKriheli/python-bidi/blob/master/src/bidi/algorithm.py#L46">https://github.com/MeirKriheli/python-bidi/blob/master/src/bidi/algorithm.py#L46</a><br>
<br>Cheers<br>-- <br><div dir="ltr">Meir<br>
</div></div>