executing binaries

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

executing binaries

Post by dibyendrah »

Hello everybody,
I have made a function to index pdfs using pdftotext. While indexing the pdf, sometimes pdftotext keeps on executing and never exits trying to convert the pdf to text. I just wanted to know if we can check whether the program is running for more than 3 seconds. if it is running for more than 3 second, function should return false and continue indexing other pdfs.

Thank you all !

With Best Regards,
Dibyendra Hyoju
User avatar
itsmani1
Forum Regular
Posts: 791
Joined: Mon Sep 29, 2003 2:26 am
Location: Islamabad Pakistan
Contact:

Post by itsmani1 »

did you tried :
void set_time_limit ( int seconds )

Code: Select all

set_time_limit()
not sure it will you or not?
i think its about page execution time
User avatar
aaronhall
DevNet Resident
Posts: 1040
Joined: Tue Aug 13, 2002 5:10 pm
Location: Back in Phoenix, missing the microbrews
Contact:

Post by aaronhall »

There's no way to set a maximum execution time for a function call. I'd research why it's taking so long for pdftotext to parse the pdf. If you can post some details about what your application is trying to do, someone may be able to make some suggestions.
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

It's not about time limit. I've set the time limit to 0. It's about pdftotext command line binary. Ther are some pdfs which it cannot handle properly or I don't know why it keeps on running and never exits.
I've made the following function to index the pdf.

Code: Select all

function index_pdf($pdf_name, $pdf_url, $real_path = "")
{
	global $path_to_pdftotext;
	global $path_to_output;
	global $HTTP_POST_VARS;
	//create the unique text file for output
	$unique_output_file_name = md5(microtime()) . ".txt";
	//execute the pdftotext binary and dump the output text in temp text file
	if ($real_path == "") {

		$cmd = $path_to_pdftotext . " " . $pdf_name . " " . $path_to_output . $unique_output_file_name;
		shell_exec($cmd);

	} else {

		$cmd = $path_to_pdftotext . " " . $real_path . " " . $path_to_output . $unique_output_file_name;
		shell_exec($cmd);

	}

	//open the temp text created and open it
	$hnd = @fopen($path_to_output . $unique_output_file_name, "r");
	//read the text file
	$pdf_text = @fread($hnd, filesize($path_to_output . $unique_output_file_name));
	$pdf_text = addslashes($pdf_text);
	//#print $pdf_text;
	@fclose($hnd);
	//delete the temp file
	//check condition here according to the config file
	@unlink($path_to_output . $unique_output_file_name);

	$parsed_url = parse_url($pdf_url);
	$scheme = $parsed_url["scheme"];
	$domain = $parsed_url["host"];
	$pdf_url = $parsed_url["path"];

	//insert the full pdf text in database
	
	$sql_insert = "INSERT INTO `tbl_pdf_fulltext` (`pdf_id`, `page_id`, `url_scheme` ,`domain`, `pdf_full_path`, `pdf_full_text`, `pdf_index_datetime`)";
	$sql_insert .= "VALUES ('NULL', '".$HTTP_POST_VARS['x_page_id']."', '$scheme', '$domain', '$pdf_url', '$pdf_text', '" . date('Y-m-d h:i:s') . "')";
	$insert_success = mysql_query($sql_insert) or die(mysql_error());
	
	
	$sql_update = "UPDATE `tbl_pages` SET `indexed` = '1' WHERE `page_id` = ".$HTTP_POST_VARS['x_page_id'];
	$update_success = mysql_query($sql_update);
	
	if($insert_success && $update_success){
		return true;
	}else{
                return false;
        }
	
}
Hope someone can modify this function so that if the program is talking more than 3 second to convert the pdf to text, the function should return false. But i'm wondering how can we stop the executing program. In localhost we can get the pid of the pdftotext use kill utility to kill that process. But while hosting, if hosting company allows us to use certain binaries like pdftotext, wget but will not allow in any case to use kill utility and grep functions. It'll be okay if I could just do the work in local machine for now.

Thank you all.

With Best Regards,
Dibyendra
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

Please try to convert this pdf to text using pdftotext.
Here is the url http://www.thdl.org/texts/reprints/nep ... es_155.pdf

I'm using pdftotext from http://www.foolabs.com/xpdf/.
please post the result after you try this.

Dibyendra
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

Dear all,
although the pdftotext utility took a long time, anywayit converted the pdf to text. Is there any other command line utiliy that does the same thing as pdftotext ?

Thank you.
Dibyendra
Post Reply