python - Add depth to nodes while iterating over lxml tree -
i want add depth each node, came following recursive function:
import lxml.html def add_depth(node, depth = 0): node.depth = depth print(node.tag, node.depth) n in node.iterchildren(): add_depth(n , depth + 1) html = """<html> <body> <div> <a></a> <h1></h1> </div> </body> </html>""" tree = lxml.html.fromstring(html) add_depth(tree) x in tree.iter(): print(x) if not hasattr(x, 'depth'): print('this should not happen', x) i thought 1 of cheapest way add depth, doing once give elements depth, , need see each element once.
the problem somehow not seem stick.... it's depth not stick onto element. somehow iterating on lxml tree generated on spot, , adding depth not stick?
what's going on here, , cheapest way elements have depth?
breakthrough
using following:
def add_depth(node, depth = 0, maxd = none): node.depth = depth if maxd none: maxd = [] maxd.append((node, node.depth)) n in node.iterchildren(): add_depth(n , depth + 1, maxd) return maxd suddenly work. code creates huge list of elements , depth next (so can sort it). while iterating on original tree, this time do have depth. not efficient @ though, , don't understand it.
@ maximoo
tree.depth = 0 x in tree.iter(): if x.getparent() not none: x.depth = x.getparent().depth + 1 attributeerror: 'htmlelement' object has no attribute 'depth'
there's couple of issues here.
the first trying make recursive function have side-effect of updating original tree. don't think possible.
the second don't want use python attributes, need use xml attributes access using
x.attrib.
a working piece of code following (it's bit awkward since continally casting depth string int, since xml attributes can't integer). doesn't use recursion, think that's overkill anyway:
tree.attrib['depth'] = '0' x in tree.iter(): if 'depth' not in x.attrib: x.attrib['depth'] = str(int(x.getparent().attrib['depth']) + 1) print(lxml.html.tostring(tree).decode()) <html depth="0"> <body depth="1"> <div depth="2"> <a depth="3"></a> <h1 depth="3"></h1> </div> </body> </html>
Comments
Post a Comment