How does Python 2 represent Unicode internally? -


when read python2's official page on unicode, says

under hood, python represents unicode strings either 16-or 32-bit integers, depending on how python interpreter compiled.

what above sentence mean? mean python2 has own special encodings of unicode? if so, why not use utf-8?

this statement means there underlying c code uses both these encodings , depending on circumstances, either variant chosen. circumstances typically user choice, compiler , operating system.

now, possible rationale that, there reasons not use utf-8:

  • first , foremost, indexing utf-8 string o(n) in complexity, while o(1) utf-32/ucs4. while irrelevant streamed data , utf-8 can save space transmission or storage, in-memory handling more convenient 1 character per unicode codepoint.
  • secondly, using 1 character per codepoint translates api python provides in language, natural choice.
  • on ms windows platforms, native encoding ui , filesystem utf-16, using encoding provides seamless integration platform.
  • on compilers wchar_t 16-bit type, if wanted use 32-bit type there have reimplement kinds of functions self-invented character type. dropping support above unicode bmp or leaking surrogate sequences python api reasonable compromise (but 1 sticks unfortunately).

note possible reasons, don't claim these apply python's implementation.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -