Python: regex findall for subcategories? -

August 15, 2014

following this question , thinking of including 1 more level of heirarchy string. example string:

sometext somemore    text here   other text                course: course1  details testname: test1 other details id              name                marks ____________________________________________________ 1               student1            65 2               student2            75 3               myname              69 4               student4            43  details testname: test3 other details id              name                marks ____________________________________________________ 1               student1            23 3               myname              63 4               student4            64                 course: course2  details testname: test2 other details id              name                marks ____________________________________________________ 1               student1            84 2               student3            73  details testname: test5 other details id              name                marks ____________________________________________________ 1               myname              84 2               student2            73                 course: course4  details testname: test1 other details id              name                marks ____________________________________________________ 1               student1            58 2               student3            89  details testname: test2 other details id              name                marks ____________________________________________________ 1               student1            97 3               myname              60 8               student6            82

and want details of myname. output (course1,test1,69),(course1,test3,63),(course2,test5,84),(course4,test2,60) or similar output.

i unable in single step, , hence came this:

import re eachcourse = re.split(r'course: \w+',string1) courselist = re.findall(r'course: (\w+)',string1) li =[] i,course in enumerate(courselist):     match = re.findall(r".*?testname: (\w+)(?:(?!\testname\b).)*myname\s+(\d+).*?",eachcourse[i+1],re.dotall)     li.append((course,match)) print li

which gives me

[('course1', [('test1', '69'), ('test3', '63')]), ('course2', [('test5', '84')]), ('course4', [('test2', '60')])]

is there better , cleaner way?

thanks.

x=re.findall(r"\bcourse: (\w+)(.*?)(?=(?:\bcourse:|$))",x,flags=re.dotall)   print [[i[0]]+re.findall(r"testname: (\w+)(?:(?!\btestname\b).)*myname\s*(\d+)",i[1],flags=re.dotall) in x]

you can try this.though format not same ,it usable.

Search This Blog

Script

Python: regex findall for subcategories? -

Comments

Post a Comment

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

javascript - Bootstrap Popover: iOS Safari strange behaviour -

spring cloud - How to configure SpringCloud Eureka instance to point to https on non standard port -