Hello everybody,
I have made a function to index pdfs using pdftotext. While indexing the pdf, sometimes pdftotext keeps on executing and never exits trying to convert the pdf to text. I just wanted to know if we can check whether the program is running for more than 3 seconds. if it is running for more than 3 second, function should return false and continue indexing other pdfs.
Thank you all !
With Best Regards,
Dibyendra Hyoju
executing binaries
Moderator: General Moderators
- dibyendrah
- Forum Contributor
- Posts: 491
- Joined: Wed Oct 19, 2005 5:14 am
- Location: Nepal
- Contact:
- itsmani1
- Forum Regular
- Posts: 791
- Joined: Mon Sep 29, 2003 2:26 am
- Location: Islamabad Pakistan
- Contact:
did you tried :
void set_time_limit ( int seconds )
not sure it will you or not?
i think its about page execution time
void set_time_limit ( int seconds )
Code: Select all
set_time_limit()i think its about page execution time
- dibyendrah
- Forum Contributor
- Posts: 491
- Joined: Wed Oct 19, 2005 5:14 am
- Location: Nepal
- Contact:
It's not about time limit. I've set the time limit to 0. It's about pdftotext command line binary. Ther are some pdfs which it cannot handle properly or I don't know why it keeps on running and never exits.
I've made the following function to index the pdf.
Hope someone can modify this function so that if the program is talking more than 3 second to convert the pdf to text, the function should return false. But i'm wondering how can we stop the executing program. In localhost we can get the pid of the pdftotext use kill utility to kill that process. But while hosting, if hosting company allows us to use certain binaries like pdftotext, wget but will not allow in any case to use kill utility and grep functions. It'll be okay if I could just do the work in local machine for now.
Thank you all.
With Best Regards,
Dibyendra
I've made the following function to index the pdf.
Code: Select all
function index_pdf($pdf_name, $pdf_url, $real_path = "")
{
global $path_to_pdftotext;
global $path_to_output;
global $HTTP_POST_VARS;
//create the unique text file for output
$unique_output_file_name = md5(microtime()) . ".txt";
//execute the pdftotext binary and dump the output text in temp text file
if ($real_path == "") {
$cmd = $path_to_pdftotext . " " . $pdf_name . " " . $path_to_output . $unique_output_file_name;
shell_exec($cmd);
} else {
$cmd = $path_to_pdftotext . " " . $real_path . " " . $path_to_output . $unique_output_file_name;
shell_exec($cmd);
}
//open the temp text created and open it
$hnd = @fopen($path_to_output . $unique_output_file_name, "r");
//read the text file
$pdf_text = @fread($hnd, filesize($path_to_output . $unique_output_file_name));
$pdf_text = addslashes($pdf_text);
//#print $pdf_text;
@fclose($hnd);
//delete the temp file
//check condition here according to the config file
@unlink($path_to_output . $unique_output_file_name);
$parsed_url = parse_url($pdf_url);
$scheme = $parsed_url["scheme"];
$domain = $parsed_url["host"];
$pdf_url = $parsed_url["path"];
//insert the full pdf text in database
$sql_insert = "INSERT INTO `tbl_pdf_fulltext` (`pdf_id`, `page_id`, `url_scheme` ,`domain`, `pdf_full_path`, `pdf_full_text`, `pdf_index_datetime`)";
$sql_insert .= "VALUES ('NULL', '".$HTTP_POST_VARS['x_page_id']."', '$scheme', '$domain', '$pdf_url', '$pdf_text', '" . date('Y-m-d h:i:s') . "')";
$insert_success = mysql_query($sql_insert) or die(mysql_error());
$sql_update = "UPDATE `tbl_pages` SET `indexed` = '1' WHERE `page_id` = ".$HTTP_POST_VARS['x_page_id'];
$update_success = mysql_query($sql_update);
if($insert_success && $update_success){
return true;
}else{
return false;
}
}
Thank you all.
With Best Regards,
Dibyendra
- dibyendrah
- Forum Contributor
- Posts: 491
- Joined: Wed Oct 19, 2005 5:14 am
- Location: Nepal
- Contact:
Please try to convert this pdf to text using pdftotext.
Here is the url http://www.thdl.org/texts/reprints/nep ... es_155.pdf
I'm using pdftotext from http://www.foolabs.com/xpdf/.
please post the result after you try this.
Dibyendra
Here is the url http://www.thdl.org/texts/reprints/nep ... es_155.pdf
I'm using pdftotext from http://www.foolabs.com/xpdf/.
please post the result after you try this.
Dibyendra
- dibyendrah
- Forum Contributor
- Posts: 491
- Joined: Wed Oct 19, 2005 5:14 am
- Location: Nepal
- Contact: