python - Remove byte order mark from objects in a list -


i using python (3.4, on windows 7) download set of text files, , when read (and write, after modifications) these files appear have few byte order marks (bom) among values retained, utf-8 bom. use each text file list (or string) , cannot seem remove these bom. ask whether possible remove bom?

for more context, text files downloaded public ftp source users upload own documents, , original encoding highly variable , unknown me. allow download run without error, specified encoding utf-8 (using latin-1 give errors). it's not mystery me have bom, , don't think up-front encoding/decoding solution answer me (convert utf-8 bom utf-8 no bom in python) - appears make frequency of other bom increase.

when modify files after download, use following syntax:

with open(t, "w", encoding='utf-8') outfile:     open(f, "r", encoding='utf-8') infile:         text = infile.read         #arguments make modifications follow 

later on, after "outfiles" read in list see words have utf-8 bom, \ufeff. try remove bom using following list comprehension:

g = list_outfile    #outfiles stored list g = [i.replace(r'\ufeff','') in g] 

while argument run, unfortunately bom remain when, example, print list (i believe have similar issue if tried remove bom strings , not lists: how remove special character?). if put normal word (non-bom) in list comprehension, word replaced.

i understand if print list object object bom not appear (special national characters won't .split() in python). , bom not in raw text files. worry bom remain when running later arguments text analysis , object appears in list \ufeffword rather word analyzed \ufeffword.

again, possible remove bom after fact?


Comments

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Magento/PHP - Get phones on all members in a customer group -

session - Logging Out Using PHP -