c# - Regex get stuck for some records -
some times regex got stuck on values although gives result of documents.
i talking when scenerio when got stuck.
1- collection = regex.matches(document, pattern,regexoptions.compiled); 2- if (collection.count > 0) //this line {
i debugged solution , wanted see values of collection in watch window. saw following result properties.
function evaluation disabled because previous function evaluation timed out. must continue execution reenable function evaluation.
later got stuck on 2nd line.
i can see there problem regex went loop.
question: don't exception .is there way can exception after timeout tool can carry on rest of work.
regex: @"""price"">(.|\r|\n)*?pound;(?<data>.*?)</span>" part of document : </span><span>1</span></a></li>\n\t\t\t\t<li>\n\t\t\t\t\t<span class=\"icon icon_floorplan touchsearch-icon touchsearch-icon-floorplan none\">floorplans: </span><span>0</span></li>\n\t\t\t\t</ul>\n\t\t</div>\n </div>\n\t</div>\n<div class=\"details clearfix\">\n\t\t<div class=\"price-new touchsearch-summary-list-item-price\">\r\n\t<a href=\"/commercial-property-for-sale/property-47109002.html\">poa</a></div>\r\n<p class=\"price\">\r\n\t\t\t<span>poa</span>\r\n\t\t\t\t</p>\r\n\t<h2 class=\"address bedrooms\">\r\n\t<a id=\"standardpropertysummary47109002\"
how exception when regex search takes unreasonably long?
please read below on setting timeout on regex searches.
msdn: regex.matchtimeout property
the matchtimeout property defines approximate maximum time interval regex instance execute single matching operation before operation times out. the regular expression engine throws regexmatchtimeoutexception exception during next timing check after time-out interval has elapsed. prevents regular expression engine processing input strings require excessive backtracking. more information, see backtracking in regular expressions , best practices regular expressions in .net framework.
public static void main() { appdomain domain = appdomain.currentdomain; // set timeout interval of 2 seconds. domain.setdata("regex_default_match_timeout", timespan.fromseconds(2)); object timeout = domain.getdata("regex_default_match_timeout"); console.writeline("default regex match timeout: {0}", timeout == null ? "<null>" : timeout); regex rgx = new regex("[aeiouy]"); console.writeline("regular expression pattern: {0}", rgx.tostring()); console.writeline("timeout interval regex: {0} seconds", rgx.matchtimeout.totalseconds); } // example displays following output: // default regex match timeout: 00:00:02 // regular expression pattern: [aeiouy] // timeout interval regex: 2 seconds
why regex stuck?
first of all, try optimize regex, minimize back-referencing if can. stribizhev commented improvement, kudos him.
another thing: regex equivalent "price">[\s\s]?pound;(?.?) (c# declaration: @"""price"">[\s\s]?pound;(?.?)"). better since there less backtracking. – stribizhev jun 4 @ 9:23
secondly, if you're having problems specific values, first thing track them down make logic per iteration (match) instead of grabbing matches one-liner.
public static void main() { string pattern = "a*"; string input = "abaabb"; match m = regex.match(input, pattern); while (m.success) { console.writeline("'{0}' found @ index {1}.", m.value, m.index); m = m.nextmatch(); } }
to improve benchmark performance without working pattern, common put regex objects in static class , instantiate them once, , add regexoptions.compiled regex when instantiating (which you've done). (source)
ps. handy able deliberately cause timeout reproducible, aka infinity loop. i'll share below.
string pattern = @"/[a-za-z0-9]+(\[([^]]*(]"")?)+])?$"; string input = "/aaa/bbb/ccc[@x='1' , @y=\"/aaa[name='z'] \"]";
Comments
Post a Comment