regex to find pdfs in html code
Posted: Tue Apr 25, 2006 2:52 am
This functions takes the html code as input and returns the matched pdf link as an array.
Cheers,
Dibyendra
Code: Select all
//function array scan_pdf(string html)
function scan_links($HTML)
{
global $HTTP_POST_VARS;
//preg_match_all("/<a[^>]+href=\"([^\"]+\.pdf)\" target=\"_blank\">/i", $HTML, $match);
preg_match_all('/<a\s+[^>]*href="([^"]+\.pdf)"[^>]*>/is', $HTML, $match);
//clean up empty array
foreach ($match as $k => $v) {
if (empty($match[$k])) {
unset($match[$k]);
}
}
//preg_match_all("/<a\s+[^>]*href=\"([^"]+\.pdf)\"[^>]*>/is",$HTML,$match);
//print_r($match); exit;
if (count($match) != 0) {
return($match);
} else {
$alert = 'Page contains no valid pdf paths!';
print "<SCRIPT> alert('$alert');</SCRIPT>";
return(false);
}
}Cheers,
Dibyendra