python - django countries encoding is not giving correct name -
i using django_countries
module countries list, problem there couple of countries special characters 'Åland islands'
, 'saint barthélemy'
.
i calling method country name:
country_label = fields.country(form.cleaned_data.get('country')[0:2]).name
i know country_label lazy translated proxy object of django utils, not giving right name rather gives 'Ã…land islands'
. suggestions please?
django stores unicode
string using code points , identifies string unicode further processing. utf-8 uses 4 8-bit bytes encoding, unicode
string that's being used django needs decoded or interpreted code point notation utf-8 notation @ point. in case of Åland islands, seems happening it's taking utf-8 byte encoding , interpret code points convert string.
the string django_countries returns u'\xc5land islands'
\xc5
utf code point notation of Å. in utf-8 byte notation \xc5
becomes \xc3\x85
each number \xc3
, \x85
8-bit byte. see: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc5&mode=hex
or can use country_label = fields.country(form.cleaned_data.get('country')[0:2]).name.encode('utf-8') go u'\xc5land islands'
'\xc3\x85land islands'
if take each byte , use them code points, you'll see it'll give these characters: Ã…
see: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc3&mode=hex and: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=x85&mode=hex
see code snippet html notation of these characters.
<div id="test">Ã…Å</div>
so i'm guessing have 2 different encodings in application. 1 way u'\xc5land islands'
u'\xc3\x85land islands'
in utf-8 environment encode utf-8 convert u'\xc5'
'\xc3\x85'
, decode unicode
iso-8859
give u'\xc3\x85land islands'
. since it's not in code you're providing, i'm guessing it's happening somewhere between moment set country_label
, moment output isn't displayed properly. either automatically because of encodings settings, or through explicit assignation somewhere.
first edit:
to set encoding app, add # -*- coding: utf-8 -*-
@ top of py file , <meta charset="utf-8">
in of template. , unicode string django.utils.functional.proxy object can call unicode()
. this:
country_label = unicode(fields.country(form.cleaned_data.get('country')[0:2]).name)
second edit:
one other way figure out problem use force_bytes
(https://docs.djangoproject.com/en/1.8/ref/utils/#module-django.utils.encoding) this:
from django.utils.encoding import force_bytes country_label = fields.country(form.cleaned_data.get('country')[0:2]).name forced_country_label = force_bytes(country_label, encoding='utf-8', strings_only=false, errors='strict')
but since tried many conversions without success, maybe problem more complex. can share version of django_countries
, python
, django app language settings? can go see directly in djano_countries
package (that should in python directory), find file data.py , open see looks like. maybe data corrupted.
Comments
Post a Comment