REGEX and URL help please

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
Sivarn
Forum Newbie
Posts: 4
Joined: Sun Jul 30, 2006 12:48 pm

REGEX and URL help please

Post by Sivarn »

One of the sites I'm building would greatly benefit from a specific RSS newsfeed. Within this newsfeed, there are links to specific sections of the source site's main page.

href="http://www.RSSsourceSite.com/#newsItem"

For some reason, these links produce a page error when followed from the site I'm developing. Rather than trying to figure out why they break, I would like to just trim off the "#newsItem" part. That way, users may not get directly to a specific area of a specific page, but at least they get to the right page.

So:

http://www.RSSsourceSite.com/#newsItem

The underlined part will always be the same. The bold part could be any combination of letters, numbers, or URL-friendly characters. With the correct regex, I can preg_replace and:

http://www.RSSsourceSite.com/#newsItem

- becomes -

http://www.RSSsourceSite.com/

I suck at REGEX, and it gives me a headache. Can anyone help?

Thanks in advance

Siv
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

Code: Select all

$stripped_string = preg_replace('%<a\s+href\s*=\s*"http://www.RSSsourceSite.com/(#(*A-Za-z0-9]\s*%', '<a href=http://www.RSSsourceSite.com/', $string_to_search);
I honestly have no idea if this work, but it is worth a shot.
Sivarn
Forum Newbie
Posts: 4
Joined: Sun Jul 30, 2006 12:48 pm

Post by Sivarn »

Thanks, I'll give it a try
Sivarn
Forum Newbie
Posts: 4
Joined: Sun Jul 30, 2006 12:48 pm

Post by Sivarn »

preg_replace(): Compilation failed: nothing to repeat at offset 49 in ... [file] ... on line 11
and line 11 is

Code: Select all

$rss = preg_replace('%<a\s+href\s*=\s*"http://www.RSSsourceSite.com/(#(*A-Za-z0-9]\s*%', '<a href=http://www.RSSsourceSite.com/', $rss);
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

Try this. I googled your error message and something came up about having '+' and '*' signs in the string. Try this code and see if it helps. If not, then I am at a lost too. I am no regexpert so I am trying to offer what I can with what I know. (Now if we were talking about food, or wine, then we'd be in business.) :wink:

Code: Select all

<?php
// These next two lines come from
// http://drupal.org/node/32370
$rss = str_replace('+', '\\+', $rss);
$rss = str_replace('*', '\\*', $rss);
$rss = preg_replace('%<a\s+href\s*=\s*"http://www.RSSsourceSite.com/(#(*A-Za-z0-9]\s*%', '<a href=http://www.RSSsourceSite.com/', $rss);
?>
Sivarn
Forum Newbie
Posts: 4
Joined: Sun Jul 30, 2006 12:48 pm

Post by Sivarn »

based on your original suggestion, the following seems to work:

Code: Select all

$rss = preg_replace('%<a\s+href\s*=\s*"http://www.RSSsourceSite.com#[A-Za-z0-9]*\s*%', '<a href="http://www.RSSsourceSite.com/', $rss);
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

Sweet. So was I actually able to help someone with a RegEx? Oh happy day!
Post Reply