java - Regular Express for ignoring white-space-only tokens -


i not expert in regular expression, wondering if can me here:

i want split following string:

04/16/2015 14:01:58.819   (27327) [err] [system call]  socket bind port=4664: address in use [tsocket:820] 

into following 5 tokens:

04/16/2015 14:01:58.819 27327 err system call socket bind port=4664: address in use [tsocket:820] 

the following java code me, using regular expression [()\\[\\]] inefficient!

list<string> splitline(string line) {     list<string> tokens = new arraylist<>();     int numtoks = 0;     line = line.trim();     //question 1: change regular expression remove white-space-only tokens!!     string[] rawtoks = line.split("[()\\[\\]]");     (string t : rawtoks) {         string token = t.trim();         if (!token.isempty()) {             if (numtoks < 4) {                 tokens.add(token);             }             numtoks++;         }     }     //question 2: can regular express enhanced eliminate step?     //in case last required token contains () or [] there more 5 tokens,     //so split 4th token (with [] around it) & use 2nd token result     if (numtoks > 4) {         tokens.add(line.split("\\[" + tokens.get(3) + "\\]")[1].trim());     }     return tokens; } 

does have answers 2 questions embedded in code above?

edit:

the following code answers both questions above, accepted answer below!

list<string> splitline(string line) {     return arrays.aslist(line.trim().split("[)\\]]?\\s+[(\\[]|]\\s+", 5));  } 

i suggest following:

return arrays.aslist(line.split("[)\\]]?\\s+[(\\[]|]\\s+")); 

explanation:

this regular expression matches 1 of 2 possibilities:

  1. an optional closing bracket/parenthesis, followed spaces, followed opening bracket/parenthesis.
  2. a closing bracket followed spaces.

the first option matches following in string:

04/16/2015 14:01:58.819   (27327) [err] [system call]  socket bind port=4664: address in use                        ^^^^     ^^^   ^^^ 

and second option matches part after "system call".

this means line split without empty tokens.

edit:

to avoid brackets/parentheses being matched in last field, when know interested in separating 5 fields, change above to:

return arrays.aslist(line.split("[)\\]]?\\s+[(\\[]|]\\s+",5)); 

string.split(string regex, int limit) version of string.split() not beyond limit tokens. is, if last token includes potential match, not tested , whole remaining string in fifth token.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -