Page 1 of 1

[SOLVED] Fetching content and replacing links

Posted: Tue Feb 22, 2005 9:59 pm
by Nailhead
I am fetching content from a website for parsing and displaying it's contents. When I display the contents, all of the images and links are broken.

Here's a bit of code to get the contents from a website into $buffer and then search and replace the contents with ParseContents() and empty the remains into $content

Code: Select all

function GrabSite($url){ 
    $fd = fopen ($url, "r");
    while (!feof ($fd)) {
        $buffer = fread($fd, 4096);
        $content .= ParseContents($buffer);
    }
    fclose ($fd);
    echo $content;
}
Because the site is stored in a variable on my domain, the whole src="..." and href="..." links become wrong.

So i need a regular expression, which searches for these tags and replaces them with correct links. I need to keep in mind that links will sometimes be encapsulated in ' or " or nothing at all.

I've been experimenting for a week with no luck.

Posted: Tue Feb 22, 2005 10:14 pm
by shiznatix
dont know exactly how to do it but sounds like you should preg_replace the http://yoursite.com with http://theirsite.com

Posted: Tue Feb 22, 2005 10:14 pm
by feyd
untested

Code: Select all

preg_match_all('#<&#1111;a-z]+(\s+&#1111;a-z]+\s*=\s*(&#1111;"''])?(.*?)\\2)*\s+(src|href)\s*=\s*(&#1111;"''])?(.*?)\\5.*?>#is', $text, $matches);

var_export($matches);

I believe I've posted something similar before... somewhere.. :?

Posted: Tue Feb 22, 2005 11:13 pm
by Nailhead
I may be doing this wrong but to view the array I'm using this:

Code: Select all

preg_match_all('#<&#1111;a-z]+(\s+&#1111;a-z]+\s*=\s*(&#1111;"''])?(.*?)\\2)*\s+(src|href)\s*=\s*(&#1111;"''])?(.*?)\\5.*?>#is', $buffer, $matches);

   foreach($matches&#1111;0] as $link) &#123;
     echo $link;
   &#125;
This results in displaying the original content with no changes. Do I need to do a preg_replace() next? Where would I write that?

Posted: Tue Feb 22, 2005 11:34 pm
by feyd
the example was just to show the regex to find the information.. you need a replace call, yes. You probably don't need a preg_match_all(). A properly written replacement pattern should take care of it..

Posted: Tue Feb 22, 2005 11:42 pm
by Nailhead
Thanks a lot for the help! This has me going in the right direction.