Page 1 of 2
Matching fixed URI's
Posted: Mon Jan 26, 2009 4:06 am
by alex.barylski
I have the following regex:
I am trying to use it to match a URI like:
If fails (as expected) when I enter a URI like:
But it still matches when I have extraneous folders, such as:
Can someone explain to me why it's matching more folders than intended? As I understand it should be doing:
1. Find everything until the '/' is encountered
2. Find everything until the '.' is encountered
3. Find everything until the end of string is found (ie: extension)
I need to *only* match ONE folder, if I need to match more I need the regex to look something like:
Which I understand should match a URI in the form:
food/apples/file.html
And if used in preg_match_all() would return an array like:
Code: Select all
[0] = food
[1] = apples
[2] = file
[3] = html
Cheers,
Alex
Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 4:38 am
by papa
Seems to work with this one:
Code: Select all
<?php
// Match /food/file.type
$pattern = "#^/\w+/\w+\.\w+$#";
$url = "/test/test.test";
if(preg_match($pattern, $url)) {
echo "File matched";
} else {
echo "File was not matched.";
}
?>
edit: small change
Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 4:55 am
by prometheuzz
PCSpectra wrote:I have the following regex:
I am trying to use it to match a URI like:
If fails (as expected) when I enter a URI like:
But it still matches when I have extraneous folders, such as:
Can someone explain to me why it's matching more folders than intended? As I understand it should be doing:
1. Find everything until the '/' is encountered
2. Find everything until the '.' is encountered
3. Find everything until the end of string is found (ie: extension)
That is not what your first regex does. That will group one or more characters other than "/" ("food" in your case) followed by a "/" and in group 2 it stores one or mre characters other than a '.' ("apples/file" in your case).
PCSpectra wrote:I need to *only* match ONE folder, if I need to match more I need the regex to look something like:
Which I understand should match a URI in the form:
food/apples/file.html
And if used in preg_match_all() would return an array like:
Code: Select all
[0] = food
[1] = apples
[2] = file
[3] = html
Cheers,
Alex
I am not sure what your question is or what you're really after. Could you post some example input strings and clearly indicate what part(s) you want to match?
Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 6:17 am
by alex.barylski
am not sure what your question is or what you're really after. Could you post some example input strings and clearly indicate what part(s) you want to match?
I need a generic regex to match a fixed number of URI segments, I don't know how else to describe it.
Basically the values that make up the URI segments (ie: folders) are/can consist of arbitrary characters (length and type) but the URI segments is fixed and must match only when the number of segments matches that of the regex, make sense?
Basically I am building regex on the fly to test whether the URI of a given request matches what is expected, otherwise a 404 error would occur.
Given a URI like:
I need to match (with brackets cause if it matches I need the segments) entire URI's and then split the segments based on tokens (not nessecarily '/' only. For instance I might have a URI like:
I need:
food
apples
index
10
html
As the segments returned when I split the string, but before I do that I am verifying whether the string contains all the right segments...I suppose I could do this using an array comparison but I'd rather keep it simple with regex.
I'm try and re-ask my question:
How would I match ONLY the following URI:
Keeping in mind I need the following segments:
Not:
So splitting on the directory deliminator will not work.
What would the regex look like for a URI like:
Knowing that each segment has unknown charatcers (type and length) and the segments are:
I assume it should look something like the above?
Cheers,
Alex
Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 6:41 am
by papa
Hehe I realize now that my post didn't help much
Code: Select all
<?php
// Match /food/file.type
$pattern = "#^/([^/]+)/([^/\(]+)\((\d+)\)\.(\w+)$#";
$url = "/dir/file(10).php";
if(preg_match_all($pattern, $url, $result)) {
echo "<pre>";
print_r($result);
echo "</pre>";
} else {
echo "File was not matched: ". $url;
}
?>

Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 7:45 am
by prometheuzz
Since you're doing two things (validating + tokenizing), I suggest breaking the problem donw in two steps. Here's a possible way to do it:
Code: Select all
$url = "food/apples/index(10).html";
if(preg_match('#^/?([^/]*/)*[^.]+\..*$#', $url)) {
print_r(preg_split('#[/().]+#', $url));
}
Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 8:25 am
by alex.barylski
Hehe I realize now that my post didn't help much
To be honest I totall glazed over your reply and went right to the end.
Anyways, I still appreciate your reply.
Code: Select all
$pattern = "#^/([^/]+)/([^/\(]+)\((\d+)\)\.(\w+)$#";
The only problem is that I am constructing my regex on the fly and I only know what the delimiters are (ie: '/') so expressions like:
I don't think I can use because I have no idea what the "type" of each segment. For instance if I have a URI like:
All I have to build the regex are the following delimitors:
Since you're doing two things (validating + tokenizing), I suggest breaking the problem donw in two steps. Here's a possible way to do it:
It is done in two steps, but I wanted to focus on the validation as the parsing seems to work as expected. I'm not sure your regex will do the trick either, only because of what I say immediately above???
Because I am constructing the regex dynamically using only the URI deliminators I am somewhat limited in what I can do.
Cheers,
Alex
Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 9:03 am
by alex.barylski
I have tried this:
Still no dice...I'm not sure I understand what is going wrong with the regex...
What is the above actually doing do want to match multiple directories when from what I understand there are only TWO deliminators and three possible matches/results
Code: Select all
[b]path[/b] + delim( / ) + [b]name [/b]+ delim ( . ) + [b]type[/b]
Where in the regex above is it saying keep finding other folders until the period before extension or EOS is reached?
Cheers,
Alex
Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 9:14 am
by mintedjo
Code: Select all
#^([^/]+)/[b][u]([^\.]+)[/u][/b]\.(.+)$#
Where in the regex above is it saying keep finding other folders...
The bold underlined bit says keep matching anything that isnt a "." so it will match anything it sees until it encounters "."
Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 10:20 am
by alex.barylski
The bold underlined bit says keep matching anything that isnt a "." so it will match anything it sees until it encounters "."
Ahhh...nice call...although the line right after:
Doesn't that add to the previous expression and say: "Until we encounter a dot"'
Actually nevermind...I think I just answered myself...LOL
Anyways I resorted to writing the damn parser by hand and had it finished in about 5 minutes...why I wasted so much time wrestling with regex I don't know.
Cheers,
Alex
Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 10:26 am
by mintedjo
I'd still like to see what the regex solution to this would be - unfortunately I'm not sure if I understand exactly what possible combinations it needs to match so I can't help to solve it anyway.
Can you post your parser so I can see exactly what you were aiming for

Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 11:04 am
by prometheuzz
PCSpectra wrote:...
It is done in two steps, but I wanted to focus on the validation as the parsing seems to work as expected. I'm not sure your regex will do the trick either, only because of what I say immediately above???
...
One way to find out: test it.
Re: Matching fixed URI's
Posted: Mon Jan 26, 2009 11:12 am
by prometheuzz
PCSpectra wrote:I have tried this:
Still no dice...I'm not sure I understand what is going wrong with the regex...
...
Don't take this the wrong way, but the problem description "still no dice" tells me not much. It's always a good idea to post the string(s) you're testing against and explain what output you're receiving and what output you had expected/hoped for.
Anyway, I see you already solved your problem by writing a little parser instead.
Good luck!
Re: Matching fixed URI's
Posted: Tue Jan 27, 2009 7:17 pm
by alex.barylski
My hand written parser worked for simple scenarios, unfortunately I need something a little more variable in nature and doing that in PHP code is to much work, so again I return to the realm of Regex.
Everything works as expect in parsing the URI into two parts (actually three I'll explain).
The problem is the last 'group'. It's intended to match everything right of the ? or a / BUT it should be optional as should the ? or / that it matches.
So given a URI like:
I should get a array such as this:
Whereas if I had a URI like:
I would have the following results:
Code: Select all
[0] = food
[1] = apples
[1] = name=value&name=value
WHat I get tight now is
Code: Select all
[0] = food
[1] = apples
[1] = ?name=value&name=value
Notice the preceeding ? and/or / depending on what is used. Ideally regex would ignore that and not make it part of my final grouped match.
Thanks again
Cheers,
Alex
Re: Matching fixed URI's
Posted: Tue Jan 27, 2009 8:03 pm
by alex.barylski
Update: I have tried the following regex:
Code: Select all
#([^/]+)/([^\?|/]+)([^\?|/]{0,}.*)#
It works but STILL includes the ? or / in the trailing group, which is what I want to ideally leave out of the final group, but the ? or / and everything after the main URI MUST be optional...
What am I doing wrong?
