gzuncompress, zlib question

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
wlq
Forum Newbie
Posts: 2
Joined: Sun Oct 25, 2009 9:28 am

gzuncompress, zlib question

Post by wlq »

Hi!
I was trying to use some of the pdf2text classes, from for example
php.net. However, with every of them I cannot read any compressed
data. I think the problem might be with gzuncompress. It shows
always:

Code: Select all

Warning: gzuncompress() [function.gzuncompress]: data error
I checked the pdf files from inside. Words "Hello world" are
represented in PDF 1.2 as:

x?3?3T0 A(??ËU?U¨` ƒQÉ?
N!\úA¦
F
!i\ ?†
?F
? @‘\.
?Ô??|…?ü?? Í ,.× …@®@. €r „

but after I compress "Hello world" in PHP I get:

xϗHꃃW(?/?I ? =

Do you know how to get this problem fixed? I just would like to read
data from PDF using PHP. I tried running external programs from PHP,
but this is not what I need.
User avatar
markusn00b
Forum Contributor
Posts: 298
Joined: Sat Oct 20, 2007 2:16 pm
Location: York, England

Re: gzuncompress, zlib question

Post by markusn00b »

Can you go through the logical steps of your application please? I'm not following. You're trying to use the PDF lib on compressed data... why?
wlq
Forum Newbie
Posts: 2
Joined: Sun Oct 25, 2009 9:28 am

Re: gzuncompress, zlib question

Post by wlq »

Ok,
I downloaded the functions to read pdf from php.net (below I present one of them):

Code: Select all

 
<?php
function handleV2($data){
 
    // try detecting \n, \r or \r\n variation
    $tmp = strpos($data, "stream");
    $end_stream_delimiter = substr($data, $tmp+6, 2);
 
    if($end_stream_delimiter != "\r\n") {
       $end_stream_delimiter = substr($end_stream_delimiter, 0, 1);
    }
    //echo bin2hex($end_stream_delimiter); // - debug information
 
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
 
    foreach($a_obj as $obj){
 
        $a_filter = getDataArray($obj,"<<",">>");
 
        if (is_array($a_filter)){
            $j++;
            $a_chunks[$j]["filter"] = $a_filter[0];
 
            $a_data = getDataArray($obj,"stream".
$end_stream_delimiter,"endstream");
            if (is_array($a_data)){
                $a_chunks[$j]["data"] = substr($a_data[0],
        strlen("stream".$end_stream_delimiter),
        strlen($a_data[0])-
strlen("stream".$end_stream_delimiter)-strlen("endstream"));
            }
        }
    }
 
    // decode the chunks
    foreach($a_chunks as $chunk){
 
        // look at each chunk and decide how to decode it - by looking at the contents of the filter
        $a_filter = split("/",$chunk["filter"]);
 
        if ($chunk["data"]!=""){
            // look at the filter to find out which encoding has been used
            if (substr($chunk["filter"],"FlateDecode")!==false){
                $data =@ gzuncompress($chunk["data"]);
                if (trim($data)!=""){
            // CHANGED HERE, before: $result_data .= ps2txt($data);
                    $result_data .= FilterNonText(PS2Text_New($data));
                } else {
 
                    //$result_data .= "x";
                }
            }
        }
    }
    return $result_data;
}
 
function FilterNonText($data) {
  for($i=1;$i<9;$i++) {
      if(strpos($data, chr($i)) !== false) {
         return ""; // not text, something strange
      }
  }
  return $data;
}
?>
 
It wasn't working so I was trying to see where the problem is. It occured that funcion gzuncompress returns the data error. I read about that function and about the whole library. Let's take for example the string "Hello world". It's being represented differently in PDF and my PHP function gzcompress. Why does it happen? Is it because some headers/footers not included in the string?
Post Reply