regex - Extract & combine multiple substrings using multiple patterns from some but not all strings contained in list & return to list in R -


i'd find elegant , manipulable way to:

  1. extract multiple substrings some, not all, strings contained elements of list (each list element consists of 1 long string)
  2. replace respective original long string these multiple substrings
  3. collapse substrings in each list element 1 string
  4. return list of same length containing replacement substrings , untouched long strings appropriate.

this question follow-on (though different) earlier question: replace strings of list elements substring. note, don't want run regex patterns on all list elements, elements regex applies.

i know end result can delivered str_replace or sub matching entire strings changed , returning text captured capturing groups, follows:

library(stringr) mylist <- as.list(c("onetwothreefourfive", "mnopqrstuvwxyz", "ghijklmnopqrs", "twentytwofortyfoursixty")) filenames <- c("ab1997r.txt", "bg2000s.txt", "mn1999r.txt", "dc1997s.txt") names(mylist) <- filenames is1997 <- str_detect(names(mylist), "1997")  regexp <- ".*(two).*(four).*" mylistnew2 <- mylist mylistnew2[is1997] <- lapply(mylist[is1997], function(i) str_replace(i, regexp, "\\1££\\2"))  ## return want: mylistnew2 $ab1997r.txt [1] "two££four"  $bg2000s.txt [1] "mnopqrstuvwxyz"  $mn1999r.txt [1] "ghijklmnopqrs"  $dc1997s.txt [1] "two££four" 

but prefer without having match entire original text (because, e.g., of time required matching long texts; of complexity of multiple regex patterns & difficulty of knitting them match entire strings successfully). use separate regex patterns extract substrings , replace original string these extracts. came following, works. surely there easier, better way! llply?

patterna <- "two" patternb <- "four" x <- mylist[is1997] x2 <- unlist(x) stringa <- str_extract (x2, patterna) stringb <- str_extract (x2, patternb) x3 <- mapply(fun=c, stringa, stringb, simplify=false) x4 <- lapply(x3, function(i) paste(i, collapse = "££")) x5 <- relist(x4,x2) mylistnew1 <- replace(mylist, is1997, x5) mylistnew1  $ab1997r.txt [1] "two££four"  $bg2000s.txt [1] "mnopqrstuvwxyz"  $mn1999r.txt [1] "ghijklmnopqrs"  $dc1997s.txt [1] "two££four" 

something maybe, i've extended patterns looking show how become adaptable:

library(stringr) patterns <- c("two","four","three") hits <- lapply(mylist[is1997], function(x) {   out <- sapply(patterns, str_extract, string=x)   paste(out[!is.na(out)],collapse="££") }) mylist[is1997] <- hits  #[[1]] #[1] "two££four££three" # #[[2]] #[1] "mnopqrstuvwxyz" # #[[3]] #[1] "ghijklmnopqrs" # #[[4]] #[1] "two££four" 

Comments

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Magento/PHP - Get phones on all members in a customer group -

session - Logging Out Using PHP -