Convert non english string to normal String in Java -
i required validate text against baselines.
for ex:
string a="la panthère"; string b="la panthère";
i know string b
contains html literals using apache stringescapeutils
gives me
string b="la panthère"; b=stringescapeutils.unescapehtml(b);
output:- la panthère
however not know whats stored in string a. somewhere got know might ascent literals , hence tried below code
a=normalizer.normalize(a, normalizer.form.nfkd);
note: tried forms of normalizer nothing worked.
can 1 please me in how make string in same fashion of b
?
as jesper mentions, è
pattern typically indicates mis-encoding.
at point, you're out of luck.
remedial actions such replacing è
not advisable, nor safe.
escaping or normalizing string
out of scope, problem @ source , has nothing html conversion or accent normalization.
however, there simple idioms convert string
different encoding.
the example below:
- simulates windows-1252
string
(in utf-8 environment). - then, prints (corrupted, since it's windows-1252
string
in utf-8 print stream). finally, prints re-converted utf-8.
string = new string( "la panthère".getbytes(charset.forname("utf-8")), charset.forname("cp1252") ); system.out.println(a); system.out.println( new string( a.getbytes(charset.forname("cp1252")), charset.forname("utf-8") ) );
output
la panthère la panthère
notes
the conversion idiom described above implies know how original string
encoded beforehand.
typical encoding issues take place when following encoding used interpret text in 1 another:
- iso latin 1
- windows-1252
- utf-8
here's list of java-supported encodings along canonical names.
in web context, you'd typically invoke javascript's encodeuricomponent function encode values in front-end, before sending them back-end.
Comments
Post a Comment