Reg Expressions to find specific URL's

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
mck.workman
Forum Newbie
Posts: 5
Joined: Tue Jan 03, 2012 4:57 pm

Reg Expressions to find specific URL's

Post by mck.workman »

Hey guys!

I am having trouble with my regular expressions and am not sure why. I am trying to get the most recent discussion (not under the featured section) on this (http://www.grasshopper3d.com/forum/cate ... orCategory) forum page then upload all the .gh attachments to a MySQL database.

1) I used $discussions_start to select everything below the heading in quotes in the code below (this way I rule out the discussions in the "featured" section of the page)
2) I selected URLs that start with http://www.grasshopper3d.com/forum/topics with any words or numbers or dashes following (Ex. http://www.grasshopper3d.com/forum/topi ... -questions)
3) Then I find all attachment URLS http://www.grasshopper3d.com/forum/atta ... dedFile%3A with any words, numbers or dashes following. (Ex. http://www.grasshopper3d.com/forum/atta ... e%3A507998)

I know the problem is my reg ex's because I used this tool (http://regex.larsolavtorvik.com) and am not getting what I expect but dont get why.

Code: Select all

<?PHP 
if(isset($_POST['Display'])) {
$url = "http://www.grasshopper3d.com/forum/categories/sample-and-example-files/listForCategory";
 $url_result_string = file_get_contents($url);
 $discussions_start = preg_match(/<TH CLASS="XG_LIGHTBORDER">DISCUSSIONS</TH>.*/i ,$url_result_string);
 $discussions_url = preg_match(/http:\/\/www\.grasshopper3d\.com\/forum\/topics\/(\w+|\d*|\-*)+/i, $discussions_start);
 $discussions_url_string = file_get_contents(&discussions_url);
 $gh_file_url = preg_match_all(/http:\/\/www.grasshopper3d.com\/forum\/attachment\/download\?id=2985220%3AUploadedFile%3A/d+/ , $discusion_url_string);

//then load $gh_file into MySQL
 }
 ?>
Thank you! Thank you!
McK
User avatar
twinedev
Forum Regular
Posts: 984
Joined: Tue Sep 28, 2010 11:41 am
Location: Columbus, Ohio

Re: Reg Expressions to find specific URL's

Post by twinedev »

You can use the following, I may be getting more info that you wanted (ThreadID, FileID, Author, etc), but just picked key data I would grab.

Code: Select all

    $strCode = getCode('http://www.grasshopper3d.com/forum/categories/sample-and-example-files/listForCategory');  
    
    if (preg_match('%<th class="xg_lightborder">Discussions</th>.*?<a class="fn url" href="http://www\.grasshopper3d\.com/profile/([^"]+)".*?<h3><a href="([^"]+)"[^>]*?>([^<]+).*?<p class="small">Started by ([^<]+)</p>%si', $strCode,$regs)) {
        
        $strSubPage = $regs[2];
        $strAuthor = $regs[4];
        $strTopic = $regs[3];
        $strUserName = $regs[1];

        $strCode = getCode($strSubPage);  
        
        if (preg_match_all('#<a href="(http://www\.grasshopper3d\.com/forum/attachment/download\?id=([0-9]+)%3AUploadedFile%3A([0-9]+))">([^<]+\.gh)</a>#si',$strCode,$regs)) {
        
            foreach ($regs[0] as $key=>$notused) {
                $strURL = $regs[1][$key];
                $strFileName = $regs[4][$key];
                $intThreadID = $regs[2][$key];
                $intFileID = $regs[3][$key];
                
                // DO WHAT YOU NEED WITH THE FILE...
                
                var_dump($strSubPage,$strAuthor,$strTopic,$strUserName,$strURL,$strFileName,$intThreadID,$intFileID);
            }
        }
    }
    echo "\n\n============D=O=N=E==============\n";
Post Reply