python - Why is this returning a NoneType? -


i'm trying scrape info off of wikipedia using function below, i'm running attribute error because function call returning none. can please try , explain why returning none?

import wikipedia wp import string  def add_section_info(search):     html = wp.page(search).html().encode("utf-8") #gets html source wikipedia      open("temp.xml",'w') t: #write html xml format         t.write(html)      table_of_contents = []     dict_of_section_info = {}      #this extracts info in table of contents     open("temp.xml",'r') r:         line in r:             if "toclevel" in line:                  new_string = line.partition("#")[2]                 content_title = new_string.partition("\"")[0]                 tbl = string.maketrans("_"," ")                 content_title = content_title.translate(tbl)                 table_of_contents.append(content_title)      print wp.page(search).section("aortic rupture") #this none, shouldn't      item in table_of_contents:         section = wp.page(search).section(item).encode("utf-8")         print section         if section == "":             continue         else:             dict_of_section_info[item] = section      open("section_info.txt",'a') sect:         sect.write(search)         sect.write("------------------------------------------\n")         item in dict_of_section_info:             sect.write(item)             sect.write("\n\n")             sect.write(dict_of_section_info[item])         sect.write("####################################\n\n")  add_section_info("abdominal aortic aneurysm") 

what don't understand if run add_section_info("hiv"), example, works perfectly.

the source code imported wikipedia here

my output on above code this:

abdominal aortic aneurysm  signs , symptoms traceback (most recent call last):   file "/home/pharoslabsllc/documents/wikitest.py", line 79, in <module> add_section_info(line)   file "/home/pharoslabsllc/documents/wikitest.py", line 30, in add_section_info     section = wp.page(search).section(item).encode("utf-8") attributeerror: 'nonetype' object has no attribute 'encode' 

the page method never returns none (you can check in source code), section method does return none if title cannot found. see documentation:

section(section_title)

get plain text content of section self.sections. returns none if section_title isn’t found, otherwise returns whitespace stripped string.

so answer wikipedia page referring has no section titled aortic rupture, as far library concerned.

looking @ wikipedia seems page abdominal aortic aneurysm have such section.

note if try check value of wp.page(search).sections get: []. i.e. it seems library isn't parsing sections properly.


from source code of library found here can see test:

section = u"== {} ==".format(section_title) try:   index = self.content.index(section) + len(section) except valueerror:   return none 

however:

in [14]: p.content.find('aortic') out[14]: 3223  in [15]: p.content[3220:3220+50] out[15]: '== aortic ruptureedit ===\n\nthe signs , symptoms ' in [16]: p.section('aortic ruptureedit') out[16]: "the signs , symptoms of ruptured aaa may includes severe pain in lower back, flank, abdomen or groin. mass pulses heart beat may felt. bleeding can leads hypovolemic shock low blood pressure , fast heart rate. may lead brief passing out.\nthe mortality of aaa rupture 90%. 65–75% of patients die before arrive @ hospital , 90% die before reach operating room. bleeding can retroperitoneal or abdominal cavity. rupture can create connection between aorta , intestine or inferior vena cava. flank ecchymosis (appearance of bruise) sign of retroperitoneal bleeding, , called grey turner's sign.\naortic aneurysm rupture may mistaken pain of kidney stones, muscle related pain." 

note edit ==. in other words library has bug doesn't take account link edit.

the same code works page hiv because in page headings don't have edit link right next them. have no idea why so, anywyay looks either bug or shortcoming of library, should open ticket on issue tracker.

in meanwhile use simple fix like:

def find_section(page, title):     res = page.section(title)     if res none:         res = page.section(title + 'edit')     return res 

and use function instead of using .section method. can temporary fix.


Comments

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Magento/PHP - Get phones on all members in a customer group -

session - Logging Out Using PHP -