Convert non english string to normal String in Java -


i required validate text against baselines.

for ex:

string a="la panthère";  string b="la panthère"; 

i know string b contains html literals using apache stringescapeutils gives me

string b="la panthère"; b=stringescapeutils.unescapehtml(b); 

output:- la panthère

however not know whats stored in string a. somewhere got know might ascent literals , hence tried below code

a=normalizer.normalize(a, normalizer.form.nfkd); 

note: tried forms of normalizer nothing worked.

can 1 please me in how make string in same fashion of b?

as jesper mentions, è pattern typically indicates mis-encoding.

at point, you're out of luck.

remedial actions such replacing è not advisable, nor safe.

escaping or normalizing string out of scope, problem @ source , has nothing html conversion or accent normalization.

however, there simple idioms convert string different encoding.

the example below:

  • simulates windows-1252 string (in utf-8 environment).
  • then, prints (corrupted, since it's windows-1252 string in utf-8 print stream).
  • finally, prints re-converted utf-8.

    string = new string( "la panthère".getbytes(charset.forname("utf-8")),  charset.forname("cp1252") ); system.out.println(a); system.out.println(     new string(         a.getbytes(charset.forname("cp1252")),          charset.forname("utf-8")     ) ); 

output

la panthère la panthère 

notes

the conversion idiom described above implies know how original string encoded beforehand.

typical encoding issues take place when following encoding used interpret text in 1 another:

  • iso latin 1
  • windows-1252
  • utf-8

here's list of java-supported encodings along canonical names.

in web context, you'd typically invoke javascript's encodeuricomponent function encode values in front-end, before sending them back-end.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -