encoding - Converting to Base64 in JavaScript without Deprecated 'Escape' call -


my name festus.

i need convert strings , base64 in browser via javascript. topic covered quite on site , on mozilla, , suggested solution seems along these lines:

function tobase64(str) {     return window.btoa(unescape(encodeuricomponent(str))); }  function frombase64(str) {     return decodeuricomponent(escape(window.atob(str))); } 

i did bit more research , found out escape() , unescape() deprecated , should no longer used. in mind, tried removing calls deprecated functions yields:

function tobase64(str) {     return window.btoa(encodeuricomponent(str)); }  function frombase64(str) {     return decodeuricomponent(window.atob(str)); } 

this seems work begs following questions:

(1) why did proposed solution include calls escape() , unescape()? solution proposed prior deprecation presumably these functions added kind of value @ time.

(2) there edge cases removal of these deprecated calls cause wrapper functions fail?

note: there other, far more verbose , complex solutions on stackoverflow problem of string=>base64 conversion. i'm sure work fine question related particular popular solution.

thanks,

festus

tl;dr in principle escape()/unescape() not necessary, , second version without deprecated functions safe, yet generates longer base64 encoded output:

  • console.log(decodeuricomponent(atob(btoa(encodeuricomponent("€uro")))))
  • console.log(decodeuricomponent(escape(atob(btoa(unescape(encodeuricomponent("€uro")))))))

both create output "€uro" yet version without escape()/unescape() longer base64 representation

  • btoa(encodeuricomponent("€uro")).length // = 16
  • btoa(unescape(encodeuricomponent("€uro"))).length // = 8

the escape()/unescape() step can become necessary if counterpart (e.g. unadjustable php-script expecting base64 done in specific way.).

long version:

first, better understand differences in between 2 versions of tobase64() , frombase64() suggest above, let have btoa() @ core of issue. documentation says, naming of btoa mnemonic

"b" can considered stand "binary", , "a" "ascii".

which misleading, documentation hastens add,

in practice, though, historical reasons, both input , output of these functions unicode strings.

even less perfect, btoa() indeed accepting

characters in range u+0000 u+00ff

plainly spoking only english alpha-numeric-text works btoa().

the purpose of encodeuricomponent(), have in both of versions, out strings having character outside range u+0000 u+00ff. example string "uü€" having 3 characters

  • a (u+0061)
  • ä (u+00e4)
  • (u+20ac)

here 2 first characters in range. third character, euro sign, outside , window.btoa("€") raises out of range error. avoid such error solution needed represent "€" within set of u+0000 u+00ff. window.encodeuricomponent does:

window.encodeuricomponent("uü€")
creates following string:
"a%c3%a4%e2%82%ac" in characters have been encoded

  • a = a (stayed same)
  • ä = %c3%a4 (changed utf8 representation)
  • = %e2%82%ac (changed utf8 representation)

the (changed utf8 representation) works using character "%" , 2 digit number each byte of character's utf8 representation. "%" u+0025 , hence allowed inside btoa()-range. result of window.encodeuricomponent("uü€") can fed btoa() has no out of range characters anymore:

btoa("a%c3%a4%e2%82%ac") \\ = "ysvdmyvbncvfmiu4mivbqw=="

the crux of using unescape() in between btoa() , encodeuricomponent() bytes of utf8 representation use 3 characters %xx store potential values of byte 0x00 0xff. here unescape() can play optional role. because unescape() takes such %xx bytes , creates in place single unicode character in allowed u+0000 0+00ff range.

to check :

  • btoa(encodeuricomponent("uü€"))).length // = 24
  • btoa(unescape(encodeuricomponent("uü€"))).length // = 8

the main difference length reduction of base64 representation of text, @ cost of additional parsing via optional escape()/unescape(), in case of ascii character set text minimal anyway.

the main lesson understand btoa() misleadingly named , requires unicode u+0000 u+00ff characters encodeuricomponent() generates. deprecated escape()/unescape() has space saving feature, maybe desirable not necessary. problem of unicode symbols > u+00ff addressed here btoa/atob unicode problem, mentions ways improve "all utf8 unicode" base64 encoding possible in modern browsers.


Comments

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Magento/PHP - Get phones on all members in a customer group -

session - Logging Out Using PHP -