java - Apache POI Anomalous Whitespace (Resolved: \u00A0 non-breaking space) -


edit: resolved answer: 00a0 nonbreaking space, not c0a0 nonbreaking space.

after using apache poi convert docx plaintext, , reading plaintext java , trying parse i've run following problems.

output:

" " first characterequals space or tab  false [b@5e481248 [b@66d3c617 arraytostring space: [32] arraytostring ?????: [-62, -96] 

for code:

system.out.println("\t\"" + line.substring(0,1) + "\"\n\tfirst characterequals space or tab \n\t" + (line.substring(0,1).equals(" ")                          || line.substring(0,1).equals("\t") )); system.out.println(line.substring(0,1).getbytes()); system.out.println(" ".getbytes()); system.out.println("arraytostring space: " + arrays.tostring(" ".getbytes())); system.out.println("arraytostring ?????: " + arrays.tostring(line.substring(0,1).getbytes())); 

string.trim() not rid of it
string.replaceall("\s" , "") not rid of it

i'm trying parse enormous materials document , turning major hurdle. have no idea what's going on or how interface it, can shed light on what's going on here?

this translates bytes hex codes c2 a0, according this answer utf-8 encoded non-breaking space. note not space , \s not match it.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -