Page 1 of 1

make relative urls not relative

Posted: Fri Jan 23, 2009 3:17 am
by shiznatix
As part of this RSS feed I am writing I want to turn all relative URLs to non-relative URLs but am of course having trouble (otherwise, why would I post :) )

This is my matching regex (just starting with matching before I move on to the preg_replace

Code: Select all

preg_match('#<a href="([^http://www\.domain\.com].*)">#', $text, $matches);
What I want is if a link does not start with http://www.domain.com for it to return that in $matches but of course what it is doing now is if it contains any of those letters it won't return, I want it to be like "starts with" instead of "contains anywhere". How do I do this?

Re: make relative urls not relative

Posted: Fri Jan 23, 2009 4:01 am
by prometheuzz
shiznatix wrote:As part of this RSS feed I am writing I want to turn all relative URLs to non-relative URLs but am of course having trouble (otherwise, why would I post :) )

This is my matching regex (just starting with matching before I move on to the preg_replace

Code: Select all

preg_match('#<a href="([^http://www\.domain\.com].*)">#', $text, $matches);
What I want is if a link does not start with http://www.domain.com for it to return that in $matches but of course what it is doing now is if it contains any of those letters it won't return, I want it to be like "starts with" instead of "contains anywhere". How do I do this?
Everything between '[' and ']' (also called a character class, or character set) will match just a single character. So, this part of your expression:

[^http://www\.domain\.com]

will match any (single!) character except: 'h', 't', 'p', ':', '/', 'w', '.', 'd', 'o', 'm', 'a', 'i', 'n', and a 'c'.

What you're looking for is probably something like this:

Code: Select all

<?php
$text = 'text <a href="http://www.domain.com/foo">foo</a> 
text <a href="/foo2">foo2</a> more text to ignore
text <a href="http://www.domain.com/bar">bar</a>
text <a href="bar2">bar2</a> and this is the end.';
 
echo '<pre>';
echo $text;
echo '</pre>';
  
if(preg_match_all('@(?<=<a\shref=")(?!http://)[^"]+@', $text, $matches)) {
  echo '<pre>';
  print_r($matches);
  echo '</pre>';
}
?>

Re: make relative urls not relative

Posted: Fri Jan 23, 2009 4:07 am
by prometheuzz
... but if you just want to change the paths, why first match them? You could simply change them at once using preg_replace(...):

Code: Select all

$text = preg_replace('@(?<=<a\shref=")(?!http://)/?([^/][^"]+)@', 'http://www.domain.com/$1', $text);
(untested!)

Re: make relative urls not relative

Posted: Fri Jan 23, 2009 4:16 am
by shiznatix
yessir thats what i needed. super thanks

Re: make relative urls not relative

Posted: Fri Jan 23, 2009 4:18 am
by prometheuzz
shiznatix wrote:yessir thats what i needed. super thanks
Good to hear it, and you're welcome.