Making a PDF Viewer with ajax or flash that can "search"?
Posted: Fri Sep 11, 2009 10:57 am
It is possible to view a pdf document, complete with search functionality, in a flash player.
Last time I checked flash's loadMovie() family commands loaded .swfs .flvs .jpgs and maybe .pngs, but how are they doing this? How do they know the word "paper" appears on page 7, at position 200px by 50px or whatever? Check it out ( search is in lower right of document ) http://www.scribd.com/doc/17350937/The- ... hl-Excerpt
It works just like the real pdf viewers
I have openOffice up and convering .doc and .docx to .pdf, I am using a python bridge caled unoconv by Dag Wieƫrs and invoking it via PHP's exec() function.
I am using imageMagic with a postscript extension to convert .pdf to page1.jpg page2.jpg page3.jpg ( http://blog.robfelty.com/2008/03/11/con ... agemagick/ )
I read that xpdf is an opensource tool that happens to run on unix (!) that convert pdfs to text... but, I'm not sure how on scribd they arent just going to that overall page the text is found, they are highlighting the exact location of the found text within the page ( which is just a jpeg I believe ).
Google Cache has a document viewer that does the same thing, and firebug shows me they use .png files for each page. ( try this out by googling filetype:pdf ) and then dont click on the actual result link but the viewer link near the result http://docs.google.com/gview?a=v&q=cach ... l=us&pli=1
They have a free tool I can embed http://googlesystem.blogspot.com/2009/0 ... iewer.html but I would still be interested in how the workings of the search feature operate
I am very much interested in any ideas of the specifics. I am about to get out ethereal and try to see whats happening because firebug can't monitor flash's net activity
Last time I checked flash's loadMovie() family commands loaded .swfs .flvs .jpgs and maybe .pngs, but how are they doing this? How do they know the word "paper" appears on page 7, at position 200px by 50px or whatever? Check it out ( search is in lower right of document ) http://www.scribd.com/doc/17350937/The- ... hl-Excerpt
It works just like the real pdf viewers
I have openOffice up and convering .doc and .docx to .pdf, I am using a python bridge caled unoconv by Dag Wieƫrs and invoking it via PHP's exec() function.
I am using imageMagic with a postscript extension to convert .pdf to page1.jpg page2.jpg page3.jpg ( http://blog.robfelty.com/2008/03/11/con ... agemagick/ )
I read that xpdf is an opensource tool that happens to run on unix (!) that convert pdfs to text... but, I'm not sure how on scribd they arent just going to that overall page the text is found, they are highlighting the exact location of the found text within the page ( which is just a jpeg I believe ).
Google Cache has a document viewer that does the same thing, and firebug shows me they use .png files for each page. ( try this out by googling filetype:pdf ) and then dont click on the actual result link but the viewer link near the result http://docs.google.com/gview?a=v&q=cach ... l=us&pli=1
They have a free tool I can embed http://googlesystem.blogspot.com/2009/0 ... iewer.html but I would still be interested in how the workings of the search feature operate
I am very much interested in any ideas of the specifics. I am about to get out ethereal and try to see whats happening because firebug can't monitor flash's net activity