Page 1 of 1
executing binaries
Posted: Mon Oct 30, 2006 4:50 am
by dibyendrah
Hello everybody,
I have made a function to index pdfs using pdftotext. While indexing the pdf, sometimes pdftotext keeps on executing and never exits trying to convert the pdf to text. I just wanted to know if we can check whether the program is running for more than 3 seconds. if it is running for more than 3 second, function should return false and continue indexing other pdfs.
Thank you all !
With Best Regards,
Dibyendra Hyoju
Posted: Mon Oct 30, 2006 6:01 am
by itsmani1
did you tried :
void set_time_limit ( int seconds )
not sure it will you or not?
i think its about page execution time
Posted: Mon Oct 30, 2006 7:01 am
by aaronhall
There's no way to set a maximum execution time for a function call. I'd research why it's taking so long for pdftotext to parse the pdf. If you can post some details about what your application is trying to do, someone may be able to make some suggestions.
Posted: Tue Oct 31, 2006 1:16 am
by dibyendrah
It's not about time limit. I've set the time limit to 0. It's about pdftotext command line binary. Ther are some pdfs which it cannot handle properly or I don't know why it keeps on running and never exits.
I've made the following function to index the pdf.
Code: Select all
function index_pdf($pdf_name, $pdf_url, $real_path = "")
{
global $path_to_pdftotext;
global $path_to_output;
global $HTTP_POST_VARS;
//create the unique text file for output
$unique_output_file_name = md5(microtime()) . ".txt";
//execute the pdftotext binary and dump the output text in temp text file
if ($real_path == "") {
$cmd = $path_to_pdftotext . " " . $pdf_name . " " . $path_to_output . $unique_output_file_name;
shell_exec($cmd);
} else {
$cmd = $path_to_pdftotext . " " . $real_path . " " . $path_to_output . $unique_output_file_name;
shell_exec($cmd);
}
//open the temp text created and open it
$hnd = @fopen($path_to_output . $unique_output_file_name, "r");
//read the text file
$pdf_text = @fread($hnd, filesize($path_to_output . $unique_output_file_name));
$pdf_text = addslashes($pdf_text);
//#print $pdf_text;
@fclose($hnd);
//delete the temp file
//check condition here according to the config file
@unlink($path_to_output . $unique_output_file_name);
$parsed_url = parse_url($pdf_url);
$scheme = $parsed_url["scheme"];
$domain = $parsed_url["host"];
$pdf_url = $parsed_url["path"];
//insert the full pdf text in database
$sql_insert = "INSERT INTO `tbl_pdf_fulltext` (`pdf_id`, `page_id`, `url_scheme` ,`domain`, `pdf_full_path`, `pdf_full_text`, `pdf_index_datetime`)";
$sql_insert .= "VALUES ('NULL', '".$HTTP_POST_VARS['x_page_id']."', '$scheme', '$domain', '$pdf_url', '$pdf_text', '" . date('Y-m-d h:i:s') . "')";
$insert_success = mysql_query($sql_insert) or die(mysql_error());
$sql_update = "UPDATE `tbl_pages` SET `indexed` = '1' WHERE `page_id` = ".$HTTP_POST_VARS['x_page_id'];
$update_success = mysql_query($sql_update);
if($insert_success && $update_success){
return true;
}else{
return false;
}
}
Hope someone can modify this function so that if the program is talking more than 3 second to convert the pdf to text, the function should return false. But i'm wondering how can we stop the executing program. In localhost we can get the pid of the pdftotext use kill utility to kill that process. But while hosting, if hosting company allows us to use certain binaries like pdftotext, wget but will not allow in any case to use kill utility and grep functions. It'll be okay if I could just do the work in local machine for now.
Thank you all.
With Best Regards,
Dibyendra
Posted: Tue Oct 31, 2006 1:47 am
by dibyendrah
Please try to convert this pdf to text using pdftotext.
Here is the url
http://www.thdl.org/texts/reprints/nep ... es_155.pdf
I'm using pdftotext from
http://www.foolabs.com/xpdf/.
please post the result after you try this.
Dibyendra
Posted: Tue Oct 31, 2006 3:03 am
by dibyendrah
Dear all,
although the pdftotext utility took a long time, anywayit converted the pdf to text. Is there any other command line utiliy that does the same thing as pdftotext ?
Thank you.
Dibyendra