ruby on rails - Html Data parsing issues in nokogiri -
i having 1 html file having plain html
i using ruby 1.8.7 need take po number & tracking no . in tracking no missing need put 'nil' in case.
but still not able solution properly.
<html> <head> </head> <body> <div>***note*** <br> items<br><br> invoice number : [982157] po number : [7894562] <br>shipped to:<br>hohne<br> troxler rd<br><br>india<br> invoice number : [982157] po number : [7894562] <br>shipped to:<br>hohne<br><br><br> <br> invoice number : [982157] po number : [7894562] <br>shipped to:<br>hohne<br>troxler rd<br><br>india<br><br>shipped via : ups track : <a href= ab.com> 1z2559690357791340</a><br><font face="courier" size="2" color="black"><br> <br> invoice number : [982157] po number : [7894562] <br>shipped to:<br>hohne<br>troxler rd<br><br>india<br> <br> invoice number : [982157] po number : [7894562] <br>shipped to:<br>hohne<br> troxler rd<br><br>india<br><br>shipped via : ups track : <a href= ab.com> 1z2559690357791340</a><br><font face="courier" size="2" color="black"><br> </body> </html>
i having code
require 'rubygems' require 'nokogiri' require 'open-uri' page_url = "a.html" page = nokogiri::html(open(page_url)) data = page.css("body").text po_numbers = data.scan(/invoice number : \[\d+\] po number : \[(\d+)\]/).flatten tracking_numbers = page.css("a").text.split [["po number", "tracking number"]].concat(po_numbers.zip(tracking_numbers)) puts po_numbers puts tracking_numbers => po_numbers = ["7894562", "7894562", "7894562","7894562","7894562"] => tracking_numbers = ["1z2559690357791340", "1z2559690357791340"] => po_numbers.zip(tracking_numbers) => [["7894562", "1z2559691257791340"], ["7894562", "1z2559690357791340"], ["7894562", "1z2559690357791340"],["7894562","nil"],["7894562,nil "]] want => [["7894562", "1z2559691257791340"], ["7894562", "nil"], ["7894562", "1z2559690357791340"],["7894562","nil"],["7894562,1z2559690357791340 "]]
i suggest using hash
in saving po_numbers
, tracking_numbers
can associate po_numbers
tracking_numbers
Comments
Post a Comment