java - Parse all PDF pages at once with iText -
i trying parse pdf file "itext". trying achieve parse pages @ once.
try { pdfreader reader = new pdfreader("d:\\hl_sv\\l04mf.pdf"); int pages = reader.getnumberofpages(); string content = ""; (int = 0; <= pages; i++) { system.out.println("============page number " + + "=============" ); content = content + " " + pdftextextractor.gettextfrompage(reader, i); } system.out.println(content); } i getting error:
exception in thread "main" java.lang.nullpointerexception @ com.itextpdf.text.pdf.parser.pdfreadercontentparser.processcontent(pdfreadercontentparser.java:77) @ com.itextpdf.text.pdf.parser.pdftextextractor.gettextfrompage(pdftextextractor.java:74) @ com.itextpdf.text.pdf.parser.pdftextextractor.gettextfrompage(pdftextextractor.java:89) @ com.pdf.pdf.main(pdf.java:18) other problem facing - hyphen being parsed ? question mark. how can fix that?
i appreciate help.
edit works me cant still solve hyphen bug.
try { pdfreader reader = new pdfreader("d:\\hl_sv\\l04mf.pdf"); int pages = reader.getnumberofpages(); for(int = 1; i<= pages; i++) { system.out.println("============page number " + + "=============" ); string line = pdftextextractor.gettextfrompage(reader,i); system.out.println(line); } }
public static string extractpdftext() throws ioexception { pdfreader pdfreader = new pdfreader("/path/to/file/myfile.pdf"); int pages = pdfreader.getnumberofpages(); string pdftext = ""; (int ctr = 1; ctr < pages + 1; ctr++) { pdftext += pdftextextractor.gettextfrompage(pdfreader, ctr); // page number cannot 0 or throw npe } pdfreader.close(); return pdftext; }
Comments
Post a Comment