Python: regex findall for subcategories? -
following this question , thinking of including 1 more level of heirarchy string. example string:
sometext somemore text here other text course: course1 details testname: test1 other details id name marks ____________________________________________________ 1 student1 65 2 student2 75 3 myname 69 4 student4 43 details testname: test3 other details id name marks ____________________________________________________ 1 student1 23 3 myname 63 4 student4 64 course: course2 details testname: test2 other details id name marks ____________________________________________________ 1 student1 84 2 student3 73 details testname: test5 other details id name marks ____________________________________________________ 1 myname 84 2 student2 73 course: course4 details testname: test1 other details id name marks ____________________________________________________ 1 student1 58 2 student3 89 details testname: test2 other details id name marks ____________________________________________________ 1 student1 97 3 myname 60 8 student6 82
and want details of myname
. output (course1,test1,69),(course1,test3,63),(course2,test5,84),(course4,test2,60)
or similar output.
i unable in single step, , hence came this:
import re eachcourse = re.split(r'course: \w+',string1) courselist = re.findall(r'course: (\w+)',string1) li =[] i,course in enumerate(courselist): match = re.findall(r".*?testname: (\w+)(?:(?!\testname\b).)*myname\s+(\d+).*?",eachcourse[i+1],re.dotall) li.append((course,match)) print li
which gives me
[('course1', [('test1', '69'), ('test3', '63')]), ('course2', [('test5', '84')]), ('course4', [('test2', '60')])]
is there better , cleaner way?
thanks.
x=re.findall(r"\bcourse: (\w+)(.*?)(?=(?:\bcourse:|$))",x,flags=re.dotall) print [[i[0]]+re.findall(r"testname: (\w+)(?:(?!\btestname\b).)*myname\s*(\d+)",i[1],flags=re.dotall) in x]
you can try this.though format not same ,it usable.
Comments
Post a Comment