Page 1 of 1

executing binaries

Posted: Mon Oct 30, 2006 4:50 am
by dibyendrah
Hello everybody,
I have made a function to index pdfs using pdftotext. While indexing the pdf, sometimes pdftotext keeps on executing and never exits trying to convert the pdf to text. I just wanted to know if we can check whether the program is running for more than 3 seconds. if it is running for more than 3 second, function should return false and continue indexing other pdfs.

Thank you all !

With Best Regards,
Dibyendra Hyoju

Posted: Mon Oct 30, 2006 6:01 am
by itsmani1
did you tried :
void set_time_limit ( int seconds )

Code: Select all

set_time_limit()
not sure it will you or not?
i think its about page execution time

Posted: Mon Oct 30, 2006 7:01 am
by aaronhall
There's no way to set a maximum execution time for a function call. I'd research why it's taking so long for pdftotext to parse the pdf. If you can post some details about what your application is trying to do, someone may be able to make some suggestions.

Posted: Tue Oct 31, 2006 1:16 am
by dibyendrah
It's not about time limit. I've set the time limit to 0. It's about pdftotext command line binary. Ther are some pdfs which it cannot handle properly or I don't know why it keeps on running and never exits.
I've made the following function to index the pdf.

Code: Select all

function index_pdf($pdf_name, $pdf_url, $real_path = "")
{
	global $path_to_pdftotext;
	global $path_to_output;
	global $HTTP_POST_VARS;
	//create the unique text file for output
	$unique_output_file_name = md5(microtime()) . ".txt";
	//execute the pdftotext binary and dump the output text in temp text file
	if ($real_path == "") {

		$cmd = $path_to_pdftotext . " " . $pdf_name . " " . $path_to_output . $unique_output_file_name;
		shell_exec($cmd);

	} else {

		$cmd = $path_to_pdftotext . " " . $real_path . " " . $path_to_output . $unique_output_file_name;
		shell_exec($cmd);

	}

	//open the temp text created and open it
	$hnd = @fopen($path_to_output . $unique_output_file_name, "r");
	//read the text file
	$pdf_text = @fread($hnd, filesize($path_to_output . $unique_output_file_name));
	$pdf_text = addslashes($pdf_text);
	//#print $pdf_text;
	@fclose($hnd);
	//delete the temp file
	//check condition here according to the config file
	@unlink($path_to_output . $unique_output_file_name);

	$parsed_url = parse_url($pdf_url);
	$scheme = $parsed_url["scheme"];
	$domain = $parsed_url["host"];
	$pdf_url = $parsed_url["path"];

	//insert the full pdf text in database
	
	$sql_insert = "INSERT INTO `tbl_pdf_fulltext` (`pdf_id`, `page_id`, `url_scheme` ,`domain`, `pdf_full_path`, `pdf_full_text`, `pdf_index_datetime`)";
	$sql_insert .= "VALUES ('NULL', '".$HTTP_POST_VARS['x_page_id']."', '$scheme', '$domain', '$pdf_url', '$pdf_text', '" . date('Y-m-d h:i:s') . "')";
	$insert_success = mysql_query($sql_insert) or die(mysql_error());
	
	
	$sql_update = "UPDATE `tbl_pages` SET `indexed` = '1' WHERE `page_id` = ".$HTTP_POST_VARS['x_page_id'];
	$update_success = mysql_query($sql_update);
	
	if($insert_success && $update_success){
		return true;
	}else{
                return false;
        }
	
}
Hope someone can modify this function so that if the program is talking more than 3 second to convert the pdf to text, the function should return false. But i'm wondering how can we stop the executing program. In localhost we can get the pid of the pdftotext use kill utility to kill that process. But while hosting, if hosting company allows us to use certain binaries like pdftotext, wget but will not allow in any case to use kill utility and grep functions. It'll be okay if I could just do the work in local machine for now.

Thank you all.

With Best Regards,
Dibyendra

Posted: Tue Oct 31, 2006 1:47 am
by dibyendrah
Please try to convert this pdf to text using pdftotext.
Here is the url http://www.thdl.org/texts/reprints/nep ... es_155.pdf

I'm using pdftotext from http://www.foolabs.com/xpdf/.
please post the result after you try this.

Dibyendra

Posted: Tue Oct 31, 2006 3:03 am
by dibyendrah
Dear all,
although the pdftotext utility took a long time, anywayit converted the pdf to text. Is there any other command line utiliy that does the same thing as pdftotext ?

Thank you.
Dibyendra