Python: regex findall for subcategories? -


following this question , thinking of including 1 more level of heirarchy string. example string:

sometext somemore    text here   other text                course: course1  details testname: test1 other details id              name                marks ____________________________________________________ 1               student1            65 2               student2            75 3               myname              69 4               student4            43  details testname: test3 other details id              name                marks ____________________________________________________ 1               student1            23 3               myname              63 4               student4            64                 course: course2  details testname: test2 other details id              name                marks ____________________________________________________ 1               student1            84 2               student3            73  details testname: test5 other details id              name                marks ____________________________________________________ 1               myname              84 2               student2            73                 course: course4  details testname: test1 other details id              name                marks ____________________________________________________ 1               student1            58 2               student3            89  details testname: test2 other details id              name                marks ____________________________________________________ 1               student1            97 3               myname              60 8               student6            82 

and want details of myname. output (course1,test1,69),(course1,test3,63),(course2,test5,84),(course4,test2,60) or similar output.

i unable in single step, , hence came this:

import re eachcourse = re.split(r'course: \w+',string1) courselist = re.findall(r'course: (\w+)',string1) li =[] i,course in enumerate(courselist):     match = re.findall(r".*?testname: (\w+)(?:(?!\testname\b).)*myname\s+(\d+).*?",eachcourse[i+1],re.dotall)     li.append((course,match)) print li 

which gives me

[('course1', [('test1', '69'), ('test3', '63')]), ('course2', [('test5', '84')]), ('course4', [('test2', '60')])] 

is there better , cleaner way?

thanks.

x=re.findall(r"\bcourse: (\w+)(.*?)(?=(?:\bcourse:|$))",x,flags=re.dotall)   print [[i[0]]+re.findall(r"testname: (\w+)(?:(?!\btestname\b).)*myname\s*(\d+)",i[1],flags=re.dotall) in x] 

you can try this.though format not same ,it usable.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -