Page 1 of 1

Not sure where to post this question

Posted: Sat Nov 05, 2005 3:30 pm
by JasonTC
Here's what I need to do: take a huge amount data from a PDF file and somehow get it into an Excel spreadsheet. I could write a script that does this, but first I would have to learn how to parse PDF files and, more challengingly, figure out how to get that data into a format that Excel can understand. Any ideas of what to do, or maybe just an idea for a better place to post this question?

Thanks,
Jason

Posted: Sat Nov 05, 2005 3:35 pm
by feyd
How often does this need to be repeated? There are *nix utilities that can ~convert a PDF to text... but you can often just load the PDF into a reader and copy the text out (if stored as text) to something else.... now, if it's not stored as text, you're pretty screwed unless your version of Acrobat converts the outline information to text again...

Posted: Sat Nov 05, 2005 8:02 pm
by JasonTC
Yeah, I don't know why I didn't even think of just copying the text out. I guess that solves that part of the problem, but then how do I get it into an Excel spreadsheet? It's a mountain of data that would be a nightmare to try to organize manually.

Posted: Sat Nov 05, 2005 9:23 pm
by feyd
pasting that copied text into a text file then using Excel's import tools to have it break apart the input may help... you may need to do some processing, just as adding tabs between fields so it's easier for Excel to read.. so just fiddle around with the import tools and how you save the text into a file..