encoding - Converting to Base64 in JavaScript without Deprecated 'Escape' call -
my name festus.
i need convert strings , base64 in browser via javascript. topic covered quite on site , on mozilla, , suggested solution seems along these lines:
function tobase64(str) { return window.btoa(unescape(encodeuricomponent(str))); } function frombase64(str) { return decodeuricomponent(escape(window.atob(str))); } i did bit more research , found out escape() , unescape() deprecated , should no longer used. in mind, tried removing calls deprecated functions yields:
function tobase64(str) { return window.btoa(encodeuricomponent(str)); } function frombase64(str) { return decodeuricomponent(window.atob(str)); } this seems work begs following questions:
(1) why did proposed solution include calls escape() , unescape()? solution proposed prior deprecation presumably these functions added kind of value @ time.
(2) there edge cases removal of these deprecated calls cause wrapper functions fail?
note: there other, far more verbose , complex solutions on stackoverflow problem of string=>base64 conversion. i'm sure work fine question related particular popular solution.
thanks,
festus
tl;dr in principle escape()/unescape() not necessary, , second version without deprecated functions safe, yet generates longer base64 encoded output:
console.log(decodeuricomponent(atob(btoa(encodeuricomponent("€uro")))))console.log(decodeuricomponent(escape(atob(btoa(unescape(encodeuricomponent("€uro")))))))
both create output "€uro" yet version without escape()/unescape() longer base64 representation
btoa(encodeuricomponent("€uro")).length // = 16btoa(unescape(encodeuricomponent("€uro"))).length // = 8
the escape()/unescape() step can become necessary if counterpart (e.g. unadjustable php-script expecting base64 done in specific way.).
long version:
first, better understand differences in between 2 versions of tobase64() , frombase64() suggest above, let have btoa() @ core of issue. documentation says, naming of btoa mnemonic
"b" can considered stand "binary", , "a" "ascii".
which misleading, documentation hastens add,
in practice, though, historical reasons, both input , output of these functions unicode strings.
even less perfect, btoa() indeed accepting
characters in range u+0000 u+00ff
plainly spoking only english alpha-numeric-text works btoa().
the purpose of encodeuricomponent(), have in both of versions, out strings having character outside range u+0000 u+00ff. example string "uü€" having 3 characters
a(u+0061)ä(u+00e4)€(u+20ac)
here 2 first characters in range. third character, euro sign, outside , window.btoa("€") raises out of range error. avoid such error solution needed represent "€" within set of u+0000 u+00ff. window.encodeuricomponent does:
window.encodeuricomponent("uü€")
creates following string:
"a%c3%a4%e2%82%ac" in characters have been encoded
a=a(stayed same)ä=%c3%a4(changed utf8 representation)€=%e2%82%ac(changed utf8 representation)
the (changed utf8 representation) works using character "%" , 2 digit number each byte of character's utf8 representation. "%" u+0025 , hence allowed inside btoa()-range. result of window.encodeuricomponent("uü€") can fed btoa() has no out of range characters anymore:
btoa("a%c3%a4%e2%82%ac") \\ = "ysvdmyvbncvfmiu4mivbqw=="
the crux of using unescape() in between btoa() , encodeuricomponent() bytes of utf8 representation use 3 characters %xx store potential values of byte 0x00 0xff. here unescape() can play optional role. because unescape() takes such %xx bytes , creates in place single unicode character in allowed u+0000 0+00ff range.
to check :
btoa(encodeuricomponent("uü€"))).length // = 24btoa(unescape(encodeuricomponent("uü€"))).length // = 8
the main difference length reduction of base64 representation of text, @ cost of additional parsing via optional escape()/unescape(), in case of ascii character set text minimal anyway.
the main lesson understand btoa() misleadingly named , requires unicode u+0000 u+00ff characters encodeuricomponent() generates. deprecated escape()/unescape() has space saving feature, maybe desirable not necessary. problem of unicode symbols > u+00ff addressed here btoa/atob unicode problem, mentions ways improve "all utf8 unicode" base64 encoding possible in modern browsers.
Comments
Post a Comment