java - Extract attributes of an string -
i got deal here problem, caused dirty design. list of string , want parse attributes out of it. unfortunately, can't change source, these string created.
example:
string s = "type=info, languagecode=en-gb, url=http://www.stackoverflow.com, ref=1, info=text, may contain kind of chars., deactivated=false" now want extract attributes type, languagecode, url, ref, info , deactivated.
the problem here field info, text not limited quote mark. commas may occur in field, can't use comma @ end of string, find out ends.
additional, strings not contain attributes. type, info , deactivated present, rest optional.
any suggestions how can solve problem?
assuming order of elements fixed write solution using regex one
string s = "type=info, languagecode=en-gb, url=http://www.stackoverflow.com, ref=1, info=text, may contain kind of chars., deactivated=false"; string regex = //type, info , deactivated present "type=(?<type>.*?)" + "(?:, languagecode=(?<languagecode>.*?))?"//optional group + "(?:, url=(?<url>.*?))?"//optional group + "(?:, ref=(?<rel>.*?))?"//optional group + ", info=(?<info>.*?)" + ", deactivated=(?<deactivated>.*?)"; pattern p = pattern.compile(regex); matcher m = p.matcher(s); if(m.matches()){ system.out.println("type -> "+m.group("type")); system.out.println("languagecode -> "+m.group("languagecode")); system.out.println("url -> "+m.group("url")); system.out.println("rel -> "+m.group("rel")); system.out.println("info -> "+m.group("info")); system.out.println("deactivated -> "+m.group("deactivated")); } output:
type -> info languagecode -> en-gb url -> http://www.stackoverflow.com rel -> 1 info -> text, may contain kind of chars. deactivated -> false edit: version2 regex searching oneofpossiblekeys=value value ends with:
, oneofpossiblekeys=- or has end of string after (represented
$).
code:
string s = "type=info, languagecode=en-gb, url=http://www.stackoverflow.com, ref=1, info=text, may contain kind of chars., deactivated=false"; string[] possiblekeys = {"type","languagecode","url","ref","info","deactivated"}; string keysstrregex = string.join("|", possiblekeys); //above contain type|languagecode|url|ref|info|deactivated string regex = "(?<key>\\b(?:"+keysstrregex+")\\b)=(?<value>.*?(?=, (?:"+keysstrregex+")=|$))"; // (?<key>\b(?:type|languagecode|url|ref|info|deactivated)\b) // = // (?<value>.*?(?=, (?:type|languagecode|url|ref|info|deactivated)=|$))system.out.println(regex); pattern p = pattern.compile(regex); matcher m = p.matcher(s); while(m.find()){ system.out.println(m.group("key")+" -> "+m.group("value")); } output:
type -> info languagecode -> en-gb url -> http://www.stackoverflow.com ref -> 1 info -> text, may contain kind of chars. deactivated -> false
Comments
Post a Comment