URL formatting help needed

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
kenja
Forum Newbie
Posts: 4
Joined: Mon Apr 21, 2008 11:02 pm

URL formatting help needed

Post by kenja »

I'm writing a script that looks at incoming URLs and determines which page to redirect them to. The urls are formatted like this:

For products:
mysite.com/my-product-number-one-p-34323.html
mysite.com/my-product-number-two-p-34324.html

For categories:
mysite.com/my-category-one-c-15.html
mysite.com/my-category-two-c-16.html

For reviews:
mysite.com/my-review-one-pr-234.html
mysite.com/my-review-two-pr-235.html

Anyway, my script is determining which type of url it is based on the presence of -p-, -c-, or -pr- in the url string. However, I am concerned that I might eventually encounter a URL like the following that has multiple matches:

mysite.com/my-product-vitamin-c-p-34234.html

so I want a php/regex function that returns the last occurrence of a matching string. Right now I'm using the following regex code to identify the matches:

Code: Select all

"(-c-|-p-|-pr)"
Is there any way to apply that in a php function that returns the last occurrence of that set of search strings?

I'd love to call something like:

Code: Select all

$page_type=preg_last_string_occurence("(-c-|-p-|-pr)", $url);
Where page type would be "-c-", "-p-", or "-pr-" depending on the type of string...

Thanks for any help you can provide!
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: URL formatting help needed

Post by prometheuzz »

Here's a way:

Code: Select all

 
$url = "mysite.com/my-product-vitamin-c-p-34234.html";
$page_type = preg_last_string_occurence("(-c-|-p-|-pr-)", $url);
echo $page_type . "\n";
  
function preg_last_string_occurence($pattern, $url) {
    if(preg_match('/'.$pattern.'(?=\d+\.html$)/', $url, $matches)) {
        return $matches[0]; 
    }
}
 
/*
The regex '/(-c-|-p-|-pr-)(?=\d+\.html$)/' matches:
 '-c-', '-p-' or '-pr-' only it is followed by one or more numbers (\d+), followed 
 by '.html' and the end of the string (the '(?=\d+\.html$)' part, which is called 
 positive look ahead)
*/
 
Post Reply