regex - Error with comment detection using Regular Expression in java -
i have create simple code of comment detection.
import javax.swing.*; import java.util.regex.*; import java.awt.*; import java.awt.event.*; public class regex extends jframe implements actionlistener { jpanel center=new jpanel(); jpanel title=new jpanel(); jtextarea text=new jtextarea(); jtextarea result=new jtextarea(); jscrollpane sctext=new jscrollpane(text); jscrollpane scresult=new jscrollpane(result); jbutton proc=new jbutton("proccess"); regex() { setsize(600,600); setlayout(new borderlayout()); add(title,borderlayout.north); title.setlayout(new gridlayout(1,2)); title.add(new jlabel("code")); title.add(new jlabel("regex")); add(center,borderlayout.center); center.setlayout(new gridlayout(1,2)); center.add(sctext); center.add(scresult); add(proc,borderlayout.south); proc.addactionlistener(this); show(); } public void actionperformed(actionevent e) { if(e.getsource()==proc) { try { result.settext(""); matcher m=pattern.compile("(/\\*(.|[\\n]|(\\*+([^*/]|[\\r\\n])))*\\*+/)|(//.*)").matcher(text.gettext()); while(m.find()) { result.append(m.group()+"\n"); } } catch(exception x) { try { file err = new file("error.txt"); java.io.printstream ps = new java.io.printstream(err); x.printstacktrace(ps); ps.close(); } catch(exception exx){} } } } public static void main(string[]agrs) { new regex(); } }
i don't know why code can't detect long comments.
i have sample of text contain long comment.
/* * apache software license, version 1.1 * * copyright (c) 1999-2003 apache software foundation. rights * reserved. * * redistribution , use in source , binary forms, or without * modification, permitted provided following conditions * met: * * 1. redistributions of source code must retain above copyright * notice, list of conditions , following disclaimer. * * 2. redistributions in binary form must reproduce above copyright * notice, list of conditions , following disclaimer in * documentation and/or other materials provided * distribution. * * 3. end-user documentation included redistribution, if * any, must include following acknowlegement: * "this product includes software developed * apache software foundation (http://www.apache.org/)." * alternately, acknowlegement may appear in software itself, * if , wherever such third-party acknowlegements appear. * * 4. names "the jakarta project", "tomcat", , "apache software * foundation" must not used endorse or promote products derived * software without prior written permission. written * permission, please contact apache@apache.org. * * 5. products derived software may not called "apache" * nor may "apache" appear in names without prior written * permission of apache group. * * software provided ``as is'' , expressed or implied * warranties, including, not limited to, implied warranties * of merchantability , fitness particular purpose * disclaimed. in no event shall apache software foundation or * contributors liable direct, indirect, incidental, * special, exemplary, or consequential damages (including, not * limited to, procurement of substitute goods or services; loss of * use, data, or profits; or business interruption) caused , * on theory of liability, whether in contract, strict liability, * or tort (including negligence or otherwise) arising in way out * of use of software, if advised of possibility of * such damage. * ==================================================================== * * software consists of voluntary contributions made many * individuals on behalf of apache software foundation. more * information on apache software foundation, please see * <http://www.apache.org/>. * */
but it's working in detecting short comment.
program catch lot of error
it seems java regular expression engine recursion-based. means regular expression has optimized produce fewer backtrackings. yet cannot see backtracking produces call stack.
following proposals work larger comments:
pattern.compile("(/\\*.*?\\*/)", pattern.dotall)
(matches /* .. */)pattern.compile("(/\\*([^\\*]|(\\*(?!/))+)*+\\*+/)|(//.*)")
explanation:
(.| ...)*
produces backtrackings because.
matches (almost) character , other alternatives matching.*
- first action eliminate.
. in case replace[^\\*]
.[^\\*]|[\\n]
==[^\\*]
remove[\\n]
[^*/]|[\\r\\n]
==[^*/]
remove[\\r\\n]
- to prevent backtracking use
*+
after content loop (possessive regular expression). requires last dot not consumed content loop. insert negative loopahead/
after matched*
, i.e.(\\*(?!/))+
Comments
Post a Comment