Viewing PDF in php

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
oldtimer
Forum Contributor
Posts: 204
Joined: Sun Nov 03, 2002 8:21 pm
Location: Washington State

Viewing PDF in php

Post by oldtimer »

Is there a way to show the contents of a pdf (html version) in php?

I can't seem to find one. I see PDFlib but looks more to making them.

I want to be able to call a PDF to be viewed without having to have adobe.
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

php runs server-side. There's nothing "shown by php".
User avatar
arturm
Forum Commoner
Posts: 86
Joined: Fri Apr 13, 2007 8:29 am
Location: NY
Contact:

Post by arturm »

volka :)


you can look at http://pdftohtml.sourceforge.net/
I haven't tested it but look very promising.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

You are looking for something that can crack PDF's open and read them? There are tools out there, as long as the PDF is not enormously large. I cannot remember the apps I found a few months when looking, but if I can locate them, I will post links.

In the meantime, you can Google PHP DRM solutions. That might get you started on the right path.
oldtimer
Forum Contributor
Posts: 204
Joined: Sun Nov 03, 2002 8:21 pm
Location: Washington State

Post by oldtimer »

volka wrote:php runs server-side. There's nothing "shown by php".
I say PHP as all my pages are .php.

I know from google its possible to view a pdf as html.
oldtimer
Forum Contributor
Posts: 204
Joined: Sun Nov 03, 2002 8:21 pm
Location: Washington State

Post by oldtimer »

I did install the pdftohtml. Still trying to figure it out though. :(
User avatar
neel_basu
Forum Contributor
Posts: 454
Joined: Wed Dec 06, 2006 9:33 am
Location: Picnic Garden, Kolkata, India

Post by neel_basu »

May be it can help PDF Solutions in PHP
User avatar
neel_basu
Forum Contributor
Posts: 454
Joined: Wed Dec 06, 2006 9:33 am
Location: Picnic Garden, Kolkata, India

Post by neel_basu »

You should use bccomplie to compile and read this in your PC.
or you can make a php CLI Application to do this Job.
But if you want pdf file in your PC will get read by an php Webpage and then converted it through. then you need to upload it first and then run codes.
oldtimer
Forum Contributor
Posts: 204
Joined: Sun Nov 03, 2002 8:21 pm
Location: Washington State

Post by oldtimer »

My ultimate goal was to be able to have PDFs displayed in a browser without using adobe.

This way the ones I do generate or make could be used.

The PDF would still be on the server but I dont really just want to convert the pdf to html and upload that. I would prefer on the fly so to speak.

I did get the pdftohtml to kind of work but still figuring it all out.

Hoping there is something else out there as well that can be used.
User avatar
OasisGames
Forum Commoner
Posts: 26
Joined: Mon Apr 23, 2007 3:24 pm
Location: Ohio

Post by OasisGames »

This might help you, from php.net;
Only does plain text, but might help you get started...

Code: Select all

<?php
// Function    : pdf2txt()
// Arguments   : $filename - Filename of the PDF you want to extract
// Description : Reads a pdf file, extracts data streams, and manages
//               their translation to plain text - returning the plain
//               text at the end
// Authors      : Jonathan Beckett, 2005-05-02
//                            : Sven Schuberth, 2007-03-29

function pdf2txt($filename){

    $data = getFileData($filename);
   
    $s=strpos($data,"%")+1;
   
    $version=substr($data,$s,strpos($data,"%",$s)-1);
    if(substr_count($version,"PDF-1.2")==0)
        return handleV3($data);
    else
        return handleV2($data);

   
}
// handles the verson 1.2
function handleV2($data){
       
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
   
    foreach($a_obj as $obj){
       
        $a_filter = getDataArray($obj,"<<",">>");
   
        if (is_array($a_filter)){
            $j++;
            $a_chunks[$j]["filter"] = $a_filter[0];

            $a_data = getDataArray($obj,"stream\r\n","endstream");
            if (is_array($a_data)){
                $a_chunks[$j]["data"] = substr($a_data[0],
strlen("stream\r\n"),
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
            }
        }
    }

    // decode the chunks
    foreach($a_chunks as $chunk){

        // look at each chunk and decide how to decode it - by looking at the contents of the filter
        $a_filter = split("/",$chunk["filter"]);
       
        if ($chunk["data"]!=""){
            // look at the filter to find out which encoding has been used           
            if (substr($chunk["filter"],"FlateDecode")!==false){
                $data =@ gzuncompress($chunk["data"]);
                if (trim($data)!=""){
                    $result_data .= ps2txt($data);
                } else {
               
                    //$result_data .= "x";
                }
            }
        }
    }
   
    return $result_data;
}

//handles versions >1.2
function handleV3($data){
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
    $result_data="";
    foreach($a_obj as $obj){
        //check if it a string
        if(substr_count($obj,"/GS1")>0){
            //the strings are between ( and )
            preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER);
            if(is_array($field))
                foreach($field as $data)
                    $result_data.=$data[1];
        }
    }
    return $result_data;
}

function ps2txt($ps_data){
    $result = "";
    $a_data = getDataArray($ps_data,"[","]");
    if (is_array($a_data)){
        foreach ($a_data as $ps_text){
            $a_text = getDataArray($ps_text,"(",")");
            if (is_array($a_text)){
                foreach ($a_text as $text){
                    $result .= substr($text,1,strlen($text)-2);
                }
            }
        }
    } else {
        // the data may just be in raw format (outside of [] tags)
        $a_text = getDataArray($ps_data,"(",")");
        if (is_array($a_text)){
            foreach ($a_text as $text){
                $result .= substr($text,1,strlen($text)-2);
            }
        }
    }
    return $result;
}

function getFileData($filename){
    $handle = fopen($filename,"rb");
    $data = fread($handle, filesize($filename));
    fclose($handle);
    return $data;
}

function getDataArray($data,$start_word,$end_word){

    $start = 0;
    $end = 0;
    unset($a_result);
   
    while ($start!==false && $end!==false){
        $start = strpos($data,$start_word,$end);
        if ($start!==false){
            $end = strpos($data,$end_word,$start);
            if ($end!==false){
                // data is between start and end
                $a_result[] = substr($data,$start,$end-$start+strlen($end_word));
            }
        }
    }
    return $a_result;
}
?>
oldtimer
Forum Contributor
Posts: 204
Joined: Sun Nov 03, 2002 8:21 pm
Location: Washington State

Post by oldtimer »

HI Thank,

I did try that but i wanted to keep the graphics, columns etc.

But its getting closer.
nickvd
DevNet Resident
Posts: 1027
Joined: Thu Mar 10, 2005 5:27 pm
Location: Southern Ontario
Contact:

Post by nickvd »

You've been given the answer already... pdf2html is precisely what you need.
My ultimate goal was to be able to have PDFs displayed in a browser without using adobe.
... PDF2HTML ... :D
DrTom
Forum Commoner
Posts: 60
Joined: Wed Aug 02, 2006 8:40 am
Location: Las Vegas

Post by DrTom »

The problem you're going ot run itno is that 1 pdf != 1 html file which means you either have to have a web accessible directory that apache can write too, which is not very secure or osme other ridiculous solution. Everyone I could come up with is a gigantic security risk to do this on the fly. That'll be the issue regardless of the solution you use.

The only solution I came up with that might be feasable is using readfile and some preg_replace magic, but because of potential security hole in the code it's been removed.
Post Reply