regex to find pdfs in html code

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

regex to find pdfs in html code

Post by dibyendrah »

This functions takes the html code as input and returns the matched pdf link as an array.

Code: Select all

//function array scan_pdf(string html)
  function scan_links($HTML)
  {
      global $HTTP_POST_VARS;
      //preg_match_all("/<a[^>]+href=\"([^\"]+\.pdf)\" target=\"_blank\">/i", $HTML, $match);
      preg_match_all('/<a\s+[^>]*href="([^"]+\.pdf)"[^>]*>/is', $HTML, $match);
      //clean up empty array
      
      foreach ($match as $k => $v) {
          if (empty($match[$k])) {
              unset($match[$k]);
          }
      }
      
      //preg_match_all("/<a\s+[^>]*href=\"([^"]+\.pdf)\"[^>]*>/is",$HTML,$match);
      //print_r($match); exit;
      if (count($match) != 0) {
          return($match);
      } else {
          
          $alert = 'Page contains no valid pdf paths!';
          print "<SCRIPT> alert('$alert');</SCRIPT>";
          return(false);
      }
  }

Cheers,
Dibyendra
Post Reply