python - django countries encoding is not giving correct name -
i using django_countries module countries list, problem there couple of countries special characters 'Åland islands' , 'saint barthélemy'.
i calling method country name:
country_label = fields.country(form.cleaned_data.get('country')[0:2]).name i know country_label lazy translated proxy object of django utils, not giving right name rather gives 'Ã…land islands'. suggestions please?
django stores unicode string using code points , identifies string unicode further processing. utf-8 uses 4 8-bit bytes encoding, unicode string that's being used django needs decoded or interpreted code point notation utf-8 notation @ point. in case of Åland islands, seems happening it's taking utf-8 byte encoding , interpret code points convert string.
the string django_countries returns u'\xc5land islands' \xc5 utf code point notation of Å. in utf-8 byte notation \xc5 becomes \xc3\x85 each number \xc3 , \x85 8-bit byte. see: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc5&mode=hex
or can use country_label = fields.country(form.cleaned_data.get('country')[0:2]).name.encode('utf-8') go u'\xc5land islands' '\xc3\x85land islands'
if take each byte , use them code points, you'll see it'll give these characters: Ã… see: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc3&mode=hex and: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=x85&mode=hex
see code snippet html notation of these characters.
<div id="test">Ã…Å</div> so i'm guessing have 2 different encodings in application. 1 way u'\xc5land islands' u'\xc3\x85land islands' in utf-8 environment encode utf-8 convert u'\xc5' '\xc3\x85' , decode unicode iso-8859 give u'\xc3\x85land islands'. since it's not in code you're providing, i'm guessing it's happening somewhere between moment set country_label , moment output isn't displayed properly. either automatically because of encodings settings, or through explicit assignation somewhere.
first edit:
to set encoding app, add # -*- coding: utf-8 -*- @ top of py file , <meta charset="utf-8"> in of template. , unicode string django.utils.functional.proxy object can call unicode(). this:
country_label = unicode(fields.country(form.cleaned_data.get('country')[0:2]).name) second edit:
one other way figure out problem use force_bytes (https://docs.djangoproject.com/en/1.8/ref/utils/#module-django.utils.encoding) this:
from django.utils.encoding import force_bytes country_label = fields.country(form.cleaned_data.get('country')[0:2]).name forced_country_label = force_bytes(country_label, encoding='utf-8', strings_only=false, errors='strict') but since tried many conversions without success, maybe problem more complex. can share version of django_countries, python , django app language settings? can go see directly in djano_countries package (that should in python directory), find file data.py , open see looks like. maybe data corrupted.
Comments
Post a Comment