Page 1 of 1

REGEX and URL help please

Posted: Sun Jul 30, 2006 12:58 pm
by Sivarn
One of the sites I'm building would greatly benefit from a specific RSS newsfeed. Within this newsfeed, there are links to specific sections of the source site's main page.

href="http://www.RSSsourceSite.com/#newsItem"

For some reason, these links produce a page error when followed from the site I'm developing. Rather than trying to figure out why they break, I would like to just trim off the "#newsItem" part. That way, users may not get directly to a specific area of a specific page, but at least they get to the right page.

So:

http://www.RSSsourceSite.com/#newsItem

The underlined part will always be the same. The bold part could be any combination of letters, numbers, or URL-friendly characters. With the correct regex, I can preg_replace and:

http://www.RSSsourceSite.com/#newsItem

- becomes -

http://www.RSSsourceSite.com/

I suck at REGEX, and it gives me a headache. Can anyone help?

Thanks in advance

Siv

Posted: Sun Jul 30, 2006 1:30 pm
by RobertGonzalez

Code: Select all

$stripped_string = preg_replace('%<a\s+href\s*=\s*"http://www.RSSsourceSite.com/(#(*A-Za-z0-9]\s*%', '<a href=http://www.RSSsourceSite.com/', $string_to_search);
I honestly have no idea if this work, but it is worth a shot.

Posted: Sun Jul 30, 2006 1:55 pm
by Sivarn
Thanks, I'll give it a try

Posted: Sun Jul 30, 2006 1:59 pm
by Sivarn
preg_replace(): Compilation failed: nothing to repeat at offset 49 in ... [file] ... on line 11
and line 11 is

Code: Select all

$rss = preg_replace('%<a\s+href\s*=\s*"http://www.RSSsourceSite.com/(#(*A-Za-z0-9]\s*%', '<a href=http://www.RSSsourceSite.com/', $rss);

Posted: Sun Jul 30, 2006 2:14 pm
by RobertGonzalez
Try this. I googled your error message and something came up about having '+' and '*' signs in the string. Try this code and see if it helps. If not, then I am at a lost too. I am no regexpert so I am trying to offer what I can with what I know. (Now if we were talking about food, or wine, then we'd be in business.) :wink:

Code: Select all

<?php
// These next two lines come from
// http://drupal.org/node/32370
$rss = str_replace('+', '\\+', $rss);
$rss = str_replace('*', '\\*', $rss);
$rss = preg_replace('%<a\s+href\s*=\s*"http://www.RSSsourceSite.com/(#(*A-Za-z0-9]\s*%', '<a href=http://www.RSSsourceSite.com/', $rss);
?>

Posted: Sun Jul 30, 2006 2:37 pm
by Sivarn
based on your original suggestion, the following seems to work:

Code: Select all

$rss = preg_replace('%<a\s+href\s*=\s*"http://www.RSSsourceSite.com#[A-Za-z0-9]*\s*%', '<a href="http://www.RSSsourceSite.com/', $rss);

Posted: Sun Jul 30, 2006 3:13 pm
by RobertGonzalez
Sweet. So was I actually able to help someone with a RegEx? Oh happy day!