Page 1 of 1

Reading through a PDF file

Posted: Fri Feb 03, 2006 11:19 am
by meemerz00
I am trying to implement this code from one of the PHP classes, and I can't get it to work. I need it to search through a PDF file for a specific string and return true if it finds it. Does anyone have any ideas?

There are two files - pdfsearch.php, and pdf_example.php.

pdfsearch.php

Code: Select all

<?php
class pdf_search { 
      
	    // Just one private variable. It holds the document.
        var $_buffer;

        // Constructor. Takes the pdf document as only parameter
        function pdf_search($buffer) {
                $this->_buffer = $buffer;
        }

        // This function returns the next line from the document. If a stream follows, it is deflated into readable text.
        function nextline() {
				$pos = strpos($this->_buffer, "\r");
                if ($pos === false) {
                        return false;
                }
                $line = substr($this->_buffer, 0, $pos);
                $this->_buffer = substr($this->_buffer, $pos + 1);
				
                if ($line == "stream") {
                        $endpos = strpos($this->_buffer, "endstream");
                        $stream = substr($this->_buffer, 1, $endpos - 1);
                        $stream = @gzuncompress($stream);
                        $this->_buffer = $stream . substr($this->_buffer, $endpos + 9);
                }
                return $line;
       }

        // This function returns the next line in the document that is printable text. We need it so we can search in just that portion.
        function textline() {
                $line = $this->nextline();
                if ($line === false) {
                        return false;
                }
                if (preg_match("/[^\\\\]\\((.+)[^\\\\]\\)/", $line, $match)) {
                        $line = preg_replace("/\\\\(\d+)/e", "chr(0\\1);", $match[1]);
                        return stripslashes($line);
                }
                return $this->textline();
        }

        // This function returns true or false, indicating whether the document contains the text that is passed in $str.
        function textfound($str) {
                while (($line = $this->textline()) !== false) {
                        if (preg_match("/$str/i", $line) != 0) {
                                return true;
                        }
                }
                return false;
        }
}
?>
pdf_example.php

Code: Select all

<?php
require("pdfsearch.php");

// The following determines the document to search in.
$theDocument = "file.pdf";

// The text to search for. Usually we get this as a result of a form submit.
$searchText = "test";

// First we read the document into memory space. Also, pdf documents can be read from a database or otherwise.
$fp = fopen($theDocument, "r");
$content = fread($fp, filesize($theDocument));
fclose($fp);

// Allocate class instance
$pdf = new pdf_search($content);

// And do the search
if ($pdf->textfound($searchText)) {
    echo "We found $searchText.";
}
else {
    echo "$searchText was not found.";
}
?>
Thanks!!

Posted: Thu Nov 29, 2007 10:27 am
by kendall
did you ever get this to work?