Regex strip all html except background style url -


i have following regex find background style urls in html. i'm trying strip html except background image urls. goal abstract list of background image urls html page.

expression url\(\s*(['"]?)(.*?)\1\s*\)

example html

<a href="#"><img style="background-image: url(http://domain.com/2003-th.jpg)"></a> 

i'd not of expression.

i don't know netbeans ide, guess only.

but beware: search url(...) everywhere. not matter text occurs: in css block, in html style-attributes, in javascript, in pure text , comments!

general modifications

if want include background-images only, should state in regex, too. becomes

\bbackground-image\s*:\s*url\(\s*(['"]?)(.*?)\1\s*\) 

to speed things (at least in implementations), try prevent backreferences. in case

\bbackground-image\s*:\s*url\(\s*(?:'([^']+)'|"([^"]+)"|([^)]+))\s*\) 

it's bit more, @ least in sublime text it's worth it.

use

to replace urls background-images, use single regex

[\s\s]*?\bbackground-image\s*:\s*url\(\s*(?:'([^']+)'|"([^"]+)"|([^)]+))\s*\)|[\s\s]+ 

and replace $1$2$3\n. there (almost) 2 \n @ end, think should no problem.

this won't work in regex engines not order of elements decisive, length of match.

however, if it's problem, can try use

[\s\s]*?\bbackground-image\s*:\s*url\(\s*(?:'([^']+)'|"([^"]+)"|([^)]+))\s*\)[\s\s]*?(?=\z|\bbackground-image\s*:\s*url\(\s*(?:'[^']+'|"[^"]+"|[^)]+)\s*\)) 

and replace $1$2$3\n.

  • [\s\s] means every character (including \n)
  • \b word boundary
  • (?= ... ) positive lookahead. has match not part of result
  • \z end of text

(maybe have tweak regex bit fit netbeans)

anyway, not every regex implementeation supports lookaheads. if not supported netbeans, have use multi-step approach:

first step

replace

[\s\s]*?\bbackground-image\s*:\s*url\(\s*(?:'([^']+)'|"([^"]+)"|([^)]+))\s*\) 

with >-bg-url:$1$2$3\n.

>-bg-url: indicate values , distinct them rest.

second step

manually replace after last match (you won't need --bg-url then) or replace

^>-bg-url:(.*)|^[\s\s]+ 

with $1


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -