python - Add depth to nodes while iterating over lxml tree -


i want add depth each node, came following recursive function:

import lxml.html  def add_depth(node, depth = 0):     node.depth = depth     print(node.tag, node.depth)     n in node.iterchildren():          add_depth(n , depth + 1)  html = """<html>             <body>               <div>                 <a></a>                 <h1></h1>               </div>             </body>           </html>"""  tree = lxml.html.fromstring(html)  add_depth(tree)  x in tree.iter():     print(x)     if not hasattr(x, 'depth'):         print('this should not happen', x) 

i thought 1 of cheapest way add depth, doing once give elements depth, , need see each element once.

the problem somehow not seem stick.... it's depth not stick onto element. somehow iterating on lxml tree generated on spot, , adding depth not stick?

what's going on here, , cheapest way elements have depth?

breakthrough

using following:

def add_depth(node, depth = 0, maxd = none):     node.depth = depth     if maxd none:         maxd = []     maxd.append((node, node.depth))      n in node.iterchildren():          add_depth(n , depth + 1, maxd)     return maxd     

suddenly work. code creates huge list of elements , depth next (so can sort it). while iterating on original tree, this time do have depth. not efficient @ though, , don't understand it.

@ maximoo

tree.depth = 0 x in tree.iter():      if x.getparent() not none:         x.depth = x.getparent().depth + 1  attributeerror: 'htmlelement' object has no attribute 'depth' 

there's couple of issues here.

  • the first trying make recursive function have side-effect of updating original tree. don't think possible.

  • the second don't want use python attributes, need use xml attributes access using x.attrib.

a working piece of code following (it's bit awkward since continally casting depth string int, since xml attributes can't integer). doesn't use recursion, think that's overkill anyway:

tree.attrib['depth'] = '0' x in tree.iter():     if 'depth' not in x.attrib:         x.attrib['depth'] = str(int(x.getparent().attrib['depth']) + 1)   print(lxml.html.tostring(tree).decode())  <html depth="0">             <body depth="1">               <div depth="2">                 <a depth="3"></a>                 <h1 depth="3"></h1>               </div>             </body>           </html> 

Comments

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Magento/PHP - Get phones on all members in a customer group -

session - Logging Out Using PHP -