python - Mechanize br.submit() limitations? -
my intention submit search query website using mechanize , analyse results using beautifulsoup. used same website , form names etc. can hardcoded. having issues initial query, shown below:
import mechanize import urllib2 #from bs4 import beautifulsoup def inspect_page(url): br = mechanize.browser(factory=mechanize.robustfactory()) br.set_handle_robots(false) br.addheaders = [('user-agent', 'mozilla/5.0 (windows; u; windows nt 5.1; en-us; rv:1.8.1.6) gecko/20070725 firefox/2.0.0.6')] br.set_handle_redirect(mechanize.httpredirecthandler) try: br.open(url) except mechanize.httperror, e: print "http error", e.code, except urllib2.urlerror e: print "url error", e.reason, return form in br.forms(): print form br.select_form(name="dataform") br.form['pcode'] = 'wv14 8ew' br.form['premise'] = '66' response = br.submit() print response.read() #soup = beautifulsoup(response.read()) inspect_page('http://www.fensa.co.uk/asp/certificate.asp')
this did not redirect results page , print response.read()
displayed html of page submitted query on, assumed had made error in code. when tested site (inspect_page('https://publicaccess.glasgow.gov.uk/online-applications/search.do?action=simple')
) , changed forms match on site:
`br.select_form(name="searchcriteriaform") br.form['searchcriteria.simplesearchstring'] = 'queen elizabeth gardens' response = br.submit() print response.read()`
i redirected expected. there stop page being redirected when br.submit()
called? i've checked site not gzipped.
one limitation mechanize
doesn't know javascript. submitting search form on site in script triggers javascript function validates input , changes action
attribute of <form>
before submitting form values.
here html part of form:
<a onclick="return validate_required()" name="submit" href="#"> <input class="button" type="button" value="search" name="submit2"> </a>
and validate_required()
function defined near beginning of html document:
function validate_required() { error = ""; if (document.getelementbyid("pcode").value == '') { error = error + "postcode\n"; } if (document.getelementbyid("premise").value == '') { error = error + "premise\n"; } if (error != '') { alert("please enter:\n\n" + error); return false; } else { document.dataform.action = "certificate_results.asp"; document.dataform.submit(); } }
Comments
Post a Comment