make relative urls not relative

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

make relative urls not relative

Post by shiznatix »

As part of this RSS feed I am writing I want to turn all relative URLs to non-relative URLs but am of course having trouble (otherwise, why would I post :) )

This is my matching regex (just starting with matching before I move on to the preg_replace

Code: Select all

preg_match('#<a href="([^http://www\.domain\.com].*)">#', $text, $matches);
What I want is if a link does not start with http://www.domain.com for it to return that in $matches but of course what it is doing now is if it contains any of those letters it won't return, I want it to be like "starts with" instead of "contains anywhere". How do I do this?
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: make relative urls not relative

Post by prometheuzz »

shiznatix wrote:As part of this RSS feed I am writing I want to turn all relative URLs to non-relative URLs but am of course having trouble (otherwise, why would I post :) )

This is my matching regex (just starting with matching before I move on to the preg_replace

Code: Select all

preg_match('#<a href="([^http://www\.domain\.com].*)">#', $text, $matches);
What I want is if a link does not start with http://www.domain.com for it to return that in $matches but of course what it is doing now is if it contains any of those letters it won't return, I want it to be like "starts with" instead of "contains anywhere". How do I do this?
Everything between '[' and ']' (also called a character class, or character set) will match just a single character. So, this part of your expression:

[^http://www\.domain\.com]

will match any (single!) character except: 'h', 't', 'p', ':', '/', 'w', '.', 'd', 'o', 'm', 'a', 'i', 'n', and a 'c'.

What you're looking for is probably something like this:

Code: Select all

<?php
$text = 'text <a href="http://www.domain.com/foo">foo</a> 
text <a href="/foo2">foo2</a> more text to ignore
text <a href="http://www.domain.com/bar">bar</a>
text <a href="bar2">bar2</a> and this is the end.';
 
echo '<pre>';
echo $text;
echo '</pre>';
  
if(preg_match_all('@(?<=<a\shref=")(?!http://)[^"]+@', $text, $matches)) {
  echo '<pre>';
  print_r($matches);
  echo '</pre>';
}
?>
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: make relative urls not relative

Post by prometheuzz »

... but if you just want to change the paths, why first match them? You could simply change them at once using preg_replace(...):

Code: Select all

$text = preg_replace('@(?<=<a\shref=")(?!http://)/?([^/][^"]+)@', 'http://www.domain.com/$1', $text);
(untested!)
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Re: make relative urls not relative

Post by shiznatix »

yessir thats what i needed. super thanks
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: make relative urls not relative

Post by prometheuzz »

shiznatix wrote:yessir thats what i needed. super thanks
Good to hear it, and you're welcome.
Post Reply