Viewing PDF in php
Moderator: General Moderators
Viewing PDF in php
Is there a way to show the contents of a pdf (html version) in php?
I can't seem to find one. I see PDFlib but looks more to making them.
I want to be able to call a PDF to be viewed without having to have adobe.
I can't seem to find one. I see PDFlib but looks more to making them.
I want to be able to call a PDF to be viewed without having to have adobe.
volka 
you can look at http://pdftohtml.sourceforge.net/
I haven't tested it but look very promising.
you can look at http://pdftohtml.sourceforge.net/
I haven't tested it but look very promising.
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
You are looking for something that can crack PDF's open and read them? There are tools out there, as long as the PDF is not enormously large. I cannot remember the apps I found a few months when looking, but if I can locate them, I will post links.
In the meantime, you can Google PHP DRM solutions. That might get you started on the right path.
In the meantime, you can Google PHP DRM solutions. That might get you started on the right path.
- neel_basu
- Forum Contributor
- Posts: 454
- Joined: Wed Dec 06, 2006 9:33 am
- Location: Picnic Garden, Kolkata, India
May be it can help PDF Solutions in PHP
My ultimate goal was to be able to have PDFs displayed in a browser without using adobe.
This way the ones I do generate or make could be used.
The PDF would still be on the server but I dont really just want to convert the pdf to html and upload that. I would prefer on the fly so to speak.
I did get the pdftohtml to kind of work but still figuring it all out.
Hoping there is something else out there as well that can be used.
This way the ones I do generate or make could be used.
The PDF would still be on the server but I dont really just want to convert the pdf to html and upload that. I would prefer on the fly so to speak.
I did get the pdftohtml to kind of work but still figuring it all out.
Hoping there is something else out there as well that can be used.
- OasisGames
- Forum Commoner
- Posts: 26
- Joined: Mon Apr 23, 2007 3:24 pm
- Location: Ohio
This might help you, from php.net;
Only does plain text, but might help you get started...
Only does plain text, but might help you get started...
Code: Select all
<?php
// Function : pdf2txt()
// Arguments : $filename - Filename of the PDF you want to extract
// Description : Reads a pdf file, extracts data streams, and manages
// their translation to plain text - returning the plain
// text at the end
// Authors : Jonathan Beckett, 2005-05-02
// : Sven Schuberth, 2007-03-29
function pdf2txt($filename){
$data = getFileData($filename);
$s=strpos($data,"%")+1;
$version=substr($data,$s,strpos($data,"%",$s)-1);
if(substr_count($version,"PDF-1.2")==0)
return handleV3($data);
else
return handleV2($data);
}
// handles the verson 1.2
function handleV2($data){
// grab objects and then grab their contents (chunks)
$a_obj = getDataArray($data,"obj","endobj");
foreach($a_obj as $obj){
$a_filter = getDataArray($obj,"<<",">>");
if (is_array($a_filter)){
$j++;
$a_chunks[$j]["filter"] = $a_filter[0];
$a_data = getDataArray($obj,"stream\r\n","endstream");
if (is_array($a_data)){
$a_chunks[$j]["data"] = substr($a_data[0],
strlen("stream\r\n"),
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
}
}
}
// decode the chunks
foreach($a_chunks as $chunk){
// look at each chunk and decide how to decode it - by looking at the contents of the filter
$a_filter = split("/",$chunk["filter"]);
if ($chunk["data"]!=""){
// look at the filter to find out which encoding has been used
if (substr($chunk["filter"],"FlateDecode")!==false){
$data =@ gzuncompress($chunk["data"]);
if (trim($data)!=""){
$result_data .= ps2txt($data);
} else {
//$result_data .= "x";
}
}
}
}
return $result_data;
}
//handles versions >1.2
function handleV3($data){
// grab objects and then grab their contents (chunks)
$a_obj = getDataArray($data,"obj","endobj");
$result_data="";
foreach($a_obj as $obj){
//check if it a string
if(substr_count($obj,"/GS1")>0){
//the strings are between ( and )
preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER);
if(is_array($field))
foreach($field as $data)
$result_data.=$data[1];
}
}
return $result_data;
}
function ps2txt($ps_data){
$result = "";
$a_data = getDataArray($ps_data,"[","]");
if (is_array($a_data)){
foreach ($a_data as $ps_text){
$a_text = getDataArray($ps_text,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
} else {
// the data may just be in raw format (outside of [] tags)
$a_text = getDataArray($ps_data,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
return $result;
}
function getFileData($filename){
$handle = fopen($filename,"rb");
$data = fread($handle, filesize($filename));
fclose($handle);
return $data;
}
function getDataArray($data,$start_word,$end_word){
$start = 0;
$end = 0;
unset($a_result);
while ($start!==false && $end!==false){
$start = strpos($data,$start_word,$end);
if ($start!==false){
$end = strpos($data,$end_word,$start);
if ($end!==false){
// data is between start and end
$a_result[] = substr($data,$start,$end-$start+strlen($end_word));
}
}
}
return $a_result;
}
?>The problem you're going ot run itno is that 1 pdf != 1 html file which means you either have to have a web accessible directory that apache can write too, which is not very secure or osme other ridiculous solution. Everyone I could come up with is a gigantic security risk to do this on the fly. That'll be the issue regardless of the solution you use.
The only solution I came up with that might be feasable is using readfile and some preg_replace magic, but because of potential security hole in the code it's been removed.
The only solution I came up with that might be feasable is using readfile and some preg_replace magic, but because of potential security hole in the code it's been removed.