java - Regular Express for ignoring white-space-only tokens -
i not expert in regular expression, wondering if can me here:
i want split following string:
04/16/2015 14:01:58.819 (27327) [err] [system call] socket bind port=4664: address in use [tsocket:820]
into following 5 tokens:
04/16/2015 14:01:58.819 27327 err system call socket bind port=4664: address in use [tsocket:820]
the following java code me, using regular expression [()\\[\\]]
inefficient!
list<string> splitline(string line) { list<string> tokens = new arraylist<>(); int numtoks = 0; line = line.trim(); //question 1: change regular expression remove white-space-only tokens!! string[] rawtoks = line.split("[()\\[\\]]"); (string t : rawtoks) { string token = t.trim(); if (!token.isempty()) { if (numtoks < 4) { tokens.add(token); } numtoks++; } } //question 2: can regular express enhanced eliminate step? //in case last required token contains () or [] there more 5 tokens, //so split 4th token (with [] around it) & use 2nd token result if (numtoks > 4) { tokens.add(line.split("\\[" + tokens.get(3) + "\\]")[1].trim()); } return tokens; }
does have answers 2 questions embedded in code above?
edit:
the following code answers both questions above, accepted answer below!
list<string> splitline(string line) { return arrays.aslist(line.trim().split("[)\\]]?\\s+[(\\[]|]\\s+", 5)); }
i suggest following:
return arrays.aslist(line.split("[)\\]]?\\s+[(\\[]|]\\s+"));
explanation:
this regular expression matches 1 of 2 possibilities:
- an optional closing bracket/parenthesis, followed spaces, followed opening bracket/parenthesis.
- a closing bracket followed spaces.
the first option matches following in string:
04/16/2015 14:01:58.819 (27327) [err] [system call] socket bind port=4664: address in use ^^^^ ^^^ ^^^
and second option matches part after "system call".
this means line split without empty tokens.
edit:
to avoid brackets/parentheses being matched in last field, when know interested in separating 5 fields, change above to:
return arrays.aslist(line.split("[)\\]]?\\s+[(\\[]|]\\s+",5));
string.split(string regex, int limit)
version of string.split()
not beyond limit
tokens. is, if last token includes potential match, not tested , whole remaining string in fifth token.
Comments
Post a Comment