Page 1 of 1

Viewing PDF in php

Posted: Wed May 02, 2007 1:32 pm
by oldtimer
Is there a way to show the contents of a pdf (html version) in php?

I can't seem to find one. I see PDFlib but looks more to making them.

I want to be able to call a PDF to be viewed without having to have adobe.

Posted: Wed May 02, 2007 1:47 pm
by volka
php runs server-side. There's nothing "shown by php".

Posted: Wed May 02, 2007 1:54 pm
by arturm
volka :)


you can look at http://pdftohtml.sourceforge.net/
I haven't tested it but look very promising.

Posted: Wed May 02, 2007 2:01 pm
by RobertGonzalez
You are looking for something that can crack PDF's open and read them? There are tools out there, as long as the PDF is not enormously large. I cannot remember the apps I found a few months when looking, but if I can locate them, I will post links.

In the meantime, you can Google PHP DRM solutions. That might get you started on the right path.

Posted: Wed May 02, 2007 2:01 pm
by oldtimer
volka wrote:php runs server-side. There's nothing "shown by php".
I say PHP as all my pages are .php.

I know from google its possible to view a pdf as html.

Posted: Wed May 02, 2007 2:36 pm
by oldtimer
I did install the pdftohtml. Still trying to figure it out though. :(

Posted: Sat May 05, 2007 6:10 am
by neel_basu
May be it can help PDF Solutions in PHP

Posted: Sat May 05, 2007 6:13 am
by neel_basu
You should use bccomplie to compile and read this in your PC.
or you can make a php CLI Application to do this Job.
But if you want pdf file in your PC will get read by an php Webpage and then converted it through. then you need to upload it first and then run codes.

Posted: Sat May 05, 2007 3:46 pm
by oldtimer
My ultimate goal was to be able to have PDFs displayed in a browser without using adobe.

This way the ones I do generate or make could be used.

The PDF would still be on the server but I dont really just want to convert the pdf to html and upload that. I would prefer on the fly so to speak.

I did get the pdftohtml to kind of work but still figuring it all out.

Hoping there is something else out there as well that can be used.

Posted: Sat May 05, 2007 4:30 pm
by OasisGames
This might help you, from php.net;
Only does plain text, but might help you get started...

Code: Select all

<?php
// Function    : pdf2txt()
// Arguments   : $filename - Filename of the PDF you want to extract
// Description : Reads a pdf file, extracts data streams, and manages
//               their translation to plain text - returning the plain
//               text at the end
// Authors      : Jonathan Beckett, 2005-05-02
//                            : Sven Schuberth, 2007-03-29

function pdf2txt($filename){

    $data = getFileData($filename);
   
    $s=strpos($data,"%")+1;
   
    $version=substr($data,$s,strpos($data,"%",$s)-1);
    if(substr_count($version,"PDF-1.2")==0)
        return handleV3($data);
    else
        return handleV2($data);

   
}
// handles the verson 1.2
function handleV2($data){
       
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
   
    foreach($a_obj as $obj){
       
        $a_filter = getDataArray($obj,"<<",">>");
   
        if (is_array($a_filter)){
            $j++;
            $a_chunks[$j]["filter"] = $a_filter[0];

            $a_data = getDataArray($obj,"stream\r\n","endstream");
            if (is_array($a_data)){
                $a_chunks[$j]["data"] = substr($a_data[0],
strlen("stream\r\n"),
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
            }
        }
    }

    // decode the chunks
    foreach($a_chunks as $chunk){

        // look at each chunk and decide how to decode it - by looking at the contents of the filter
        $a_filter = split("/",$chunk["filter"]);
       
        if ($chunk["data"]!=""){
            // look at the filter to find out which encoding has been used           
            if (substr($chunk["filter"],"FlateDecode")!==false){
                $data =@ gzuncompress($chunk["data"]);
                if (trim($data)!=""){
                    $result_data .= ps2txt($data);
                } else {
               
                    //$result_data .= "x";
                }
            }
        }
    }
   
    return $result_data;
}

//handles versions >1.2
function handleV3($data){
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
    $result_data="";
    foreach($a_obj as $obj){
        //check if it a string
        if(substr_count($obj,"/GS1")>0){
            //the strings are between ( and )
            preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER);
            if(is_array($field))
                foreach($field as $data)
                    $result_data.=$data[1];
        }
    }
    return $result_data;
}

function ps2txt($ps_data){
    $result = "";
    $a_data = getDataArray($ps_data,"[","]");
    if (is_array($a_data)){
        foreach ($a_data as $ps_text){
            $a_text = getDataArray($ps_text,"(",")");
            if (is_array($a_text)){
                foreach ($a_text as $text){
                    $result .= substr($text,1,strlen($text)-2);
                }
            }
        }
    } else {
        // the data may just be in raw format (outside of [] tags)
        $a_text = getDataArray($ps_data,"(",")");
        if (is_array($a_text)){
            foreach ($a_text as $text){
                $result .= substr($text,1,strlen($text)-2);
            }
        }
    }
    return $result;
}

function getFileData($filename){
    $handle = fopen($filename,"rb");
    $data = fread($handle, filesize($filename));
    fclose($handle);
    return $data;
}

function getDataArray($data,$start_word,$end_word){

    $start = 0;
    $end = 0;
    unset($a_result);
   
    while ($start!==false && $end!==false){
        $start = strpos($data,$start_word,$end);
        if ($start!==false){
            $end = strpos($data,$end_word,$start);
            if ($end!==false){
                // data is between start and end
                $a_result[] = substr($data,$start,$end-$start+strlen($end_word));
            }
        }
    }
    return $a_result;
}
?>

Posted: Sat May 05, 2007 6:17 pm
by oldtimer
HI Thank,

I did try that but i wanted to keep the graphics, columns etc.

But its getting closer.

Posted: Sat May 05, 2007 6:55 pm
by nickvd
You've been given the answer already... pdf2html is precisely what you need.
My ultimate goal was to be able to have PDFs displayed in a browser without using adobe.
... PDF2HTML ... :D

Posted: Sat May 05, 2007 7:40 pm
by DrTom
The problem you're going ot run itno is that 1 pdf != 1 html file which means you either have to have a web accessible directory that apache can write too, which is not very secure or osme other ridiculous solution. Everyone I could come up with is a gigantic security risk to do this on the fly. That'll be the issue regardless of the solution you use.

The only solution I came up with that might be feasable is using readfile and some preg_replace magic, but because of potential security hole in the code it's been removed.