Convert non english string to normal String in Java -
i required validate text against baselines.
for ex:
string a="la panthère"; string b="la panthère"; i know string b contains html literals using apache stringescapeutils gives me
string b="la panthère"; b=stringescapeutils.unescapehtml(b); output:- la panthère
however not know whats stored in string a. somewhere got know might ascent literals , hence tried below code
a=normalizer.normalize(a, normalizer.form.nfkd); note: tried forms of normalizer nothing worked.
can 1 please me in how make string in same fashion of b?
as jesper mentions, è pattern typically indicates mis-encoding.
at point, you're out of luck.
remedial actions such replacing è not advisable, nor safe.
escaping or normalizing string out of scope, problem @ source , has nothing html conversion or accent normalization.
however, there simple idioms convert string different encoding.
the example below:
- simulates windows-1252
string(in utf-8 environment). - then, prints (corrupted, since it's windows-1252
stringin utf-8 print stream). finally, prints re-converted utf-8.
string = new string( "la panthère".getbytes(charset.forname("utf-8")), charset.forname("cp1252") ); system.out.println(a); system.out.println( new string( a.getbytes(charset.forname("cp1252")), charset.forname("utf-8") ) );
output
la panthère la panthère notes
the conversion idiom described above implies know how original string encoded beforehand.
typical encoding issues take place when following encoding used interpret text in 1 another:
- iso latin 1
- windows-1252
- utf-8
here's list of java-supported encodings along canonical names.
in web context, you'd typically invoke javascript's encodeuricomponent function encode values in front-end, before sending them back-end.
Comments
Post a Comment