Page 1 of 2

Matching fixed URI's

Posted: Mon Jan 26, 2009 4:06 am
by alex.barylski
I have the following regex:

Code: Select all

#([^/]+)/([^\.]+)\.#
I am trying to use it to match a URI like:

Code: Select all

/food/file.type
If fails (as expected) when I enter a URI like:

Code: Select all

/food/file
But it still matches when I have extraneous folders, such as:

Code: Select all

/food/apples/file.html
Can someone explain to me why it's matching more folders than intended? As I understand it should be doing:

1. Find everything until the '/' is encountered
2. Find everything until the '.' is encountered
3. Find everything until the end of string is found (ie: extension)

I need to *only* match ONE folder, if I need to match more I need the regex to look something like:

Code: Select all

#([^/]+)/([^/]+)/([^\.]+)\.(.+)#
Which I understand should match a URI in the form:

food/apples/file.html

And if used in preg_match_all() would return an array like:

Code: Select all

[0] = food
[1] = apples
[2] = file
[3] = html
Cheers,
Alex

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 4:38 am
by papa
Seems to work with this one:

Code: Select all

 
<?php
// Match /food/file.type
 
$pattern = "#^/\w+/\w+\.\w+$#";
$url = "/test/test.test";
 
if(preg_match($pattern, $url)) {
    echo "File matched";
} else {
    echo "File was not matched.";
}
?>
 
edit: small change

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 4:55 am
by prometheuzz
PCSpectra wrote:I have the following regex:

Code: Select all

#([^/]+)/([^\.]+)\.#
I am trying to use it to match a URI like:

Code: Select all

/food/file.type
If fails (as expected) when I enter a URI like:

Code: Select all

/food/file
But it still matches when I have extraneous folders, such as:

Code: Select all

/food/apples/file.html
Can someone explain to me why it's matching more folders than intended? As I understand it should be doing:

1. Find everything until the '/' is encountered
2. Find everything until the '.' is encountered
3. Find everything until the end of string is found (ie: extension)
That is not what your first regex does. That will group one or more characters other than "/" ("food" in your case) followed by a "/" and in group 2 it stores one or mre characters other than a '.' ("apples/file" in your case).
PCSpectra wrote:I need to *only* match ONE folder, if I need to match more I need the regex to look something like:

Code: Select all

#([^/]+)/([^/]+)/([^\.]+)\.(.+)#
Which I understand should match a URI in the form:

food/apples/file.html

And if used in preg_match_all() would return an array like:

Code: Select all

[0] = food
[1] = apples
[2] = file
[3] = html
Cheers,
Alex
I am not sure what your question is or what you're really after. Could you post some example input strings and clearly indicate what part(s) you want to match?

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 6:17 am
by alex.barylski
am not sure what your question is or what you're really after. Could you post some example input strings and clearly indicate what part(s) you want to match?
I need a generic regex to match a fixed number of URI segments, I don't know how else to describe it. :P

Basically the values that make up the URI segments (ie: folders) are/can consist of arbitrary characters (length and type) but the URI segments is fixed and must match only when the number of segments matches that of the regex, make sense?

Basically I am building regex on the fly to test whether the URI of a given request matches what is expected, otherwise a 404 error would occur.

Given a URI like:

Code: Select all

/food/apples/index.html
I need to match (with brackets cause if it matches I need the segments) entire URI's and then split the segments based on tokens (not nessecarily '/' only. For instance I might have a URI like:

Code: Select all

food/apples/index(10).html
I need:

food
apples
index
10
html

As the segments returned when I split the string, but before I do that I am verifying whether the string contains all the right segments...I suppose I could do this using an array comparison but I'd rather keep it simple with regex.

I'm try and re-ask my question:

How would I match ONLY the following URI:

Code: Select all

food/apples/index.html


Keeping in mind I need the following segments:

Code: Select all

food
apples
index
html
Not:

Code: Select all

food
apples
index.html
So splitting on the directory deliminator will not work.

What would the regex look like for a URI like:

Code: Select all

food/apples/index(10).html


Knowing that each segment has unknown charatcers (type and length) and the segments are:

Code: Select all

food
apples
index
10
html

Code: Select all

#([^/)+/([^/])+/([^\(])\(#
I assume it should look something like the above?
Cheers,
Alex

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 6:41 am
by papa
Hehe I realize now that my post didn't help much :)

Code: Select all

 
<?php
// Match /food/file.type
 
$pattern = "#^/([^/]+)/([^/\(]+)\((\d+)\)\.(\w+)$#";
$url = "/dir/file(10).php";
 
if(preg_match_all($pattern, $url, $result)) {
    echo "<pre>";
    print_r($result);
    echo "</pre>";
} else {
    echo "File was not matched: ". $url;
}
?>
 

:oops:

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 7:45 am
by prometheuzz
Since you're doing two things (validating + tokenizing), I suggest breaking the problem donw in two steps. Here's a possible way to do it:

Code: Select all

$url = "food/apples/index(10).html"; 
if(preg_match('#^/?([^/]*/)*[^.]+\..*$#', $url)) {
    print_r(preg_split('#[/().]+#', $url));
}

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 8:25 am
by alex.barylski
Hehe I realize now that my post didn't help much
To be honest I totall glazed over your reply and went right to the end. :oops:

Anyways, I still appreciate your reply.

Code: Select all

$pattern = "#^/([^/]+)/([^/\(]+)\((\d+)\)\.(\w+)$#";
The only problem is that I am constructing my regex on the fly and I only know what the delimiters are (ie: '/') so expressions like:

Code: Select all

\d+
I don't think I can use because I have no idea what the "type" of each segment. For instance if I have a URI like:

Code: Select all

food/apples/index(10).html
All I have to build the regex are the following delimitors:

Code: Select all

[0] = /
[1] = /
[2] = (
[3] = ).
Since you're doing two things (validating + tokenizing), I suggest breaking the problem donw in two steps. Here's a possible way to do it:
It is done in two steps, but I wanted to focus on the validation as the parsing seems to work as expected. I'm not sure your regex will do the trick either, only because of what I say immediately above???

Because I am constructing the regex dynamically using only the URI deliminators I am somewhat limited in what I can do.

Cheers,
Alex

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 9:03 am
by alex.barylski
I have tried this:

Code: Select all

#^([^/]+)/([^\.]+)\.(.+)$#
Still no dice...I'm not sure I understand what is going wrong with the regex...

What is the above actually doing do want to match multiple directories when from what I understand there are only TWO deliminators and three possible matches/results

Code: Select all

[b]path[/b] + delim( / ) + [b]name [/b]+ delim ( . ) + [b]type[/b]
Where in the regex above is it saying keep finding other folders until the period before extension or EOS is reached? :banghead: :lol:

Cheers,
Alex

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 9:14 am
by mintedjo

Code: Select all

#^([^/]+)/[b][u]([^\.]+)[/u][/b]\.(.+)$#
Where in the regex above is it saying keep finding other folders...
The bold underlined bit says keep matching anything that isnt a "." so it will match anything it sees until it encounters "."

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 10:20 am
by alex.barylski
The bold underlined bit says keep matching anything that isnt a "." so it will match anything it sees until it encounters "."
Ahhh...nice call...although the line right after:

Code: Select all

\.
Doesn't that add to the previous expression and say: "Until we encounter a dot"'

Actually nevermind...I think I just answered myself...LOL

Anyways I resorted to writing the damn parser by hand and had it finished in about 5 minutes...why I wasted so much time wrestling with regex I don't know.

Cheers,
Alex

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 10:26 am
by mintedjo
I'd still like to see what the regex solution to this would be - unfortunately I'm not sure if I understand exactly what possible combinations it needs to match so I can't help to solve it anyway.
Can you post your parser so I can see exactly what you were aiming for :-)

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 11:04 am
by prometheuzz
PCSpectra wrote:...
It is done in two steps, but I wanted to focus on the validation as the parsing seems to work as expected. I'm not sure your regex will do the trick either, only because of what I say immediately above???
...
One way to find out: test it.

Re: Matching fixed URI's

Posted: Mon Jan 26, 2009 11:12 am
by prometheuzz
PCSpectra wrote:I have tried this:

Code: Select all

#^([^/]+)/([^\.]+)\.(.+)$#
Still no dice...I'm not sure I understand what is going wrong with the regex...

...
Don't take this the wrong way, but the problem description "still no dice" tells me not much. It's always a good idea to post the string(s) you're testing against and explain what output you're receiving and what output you had expected/hoped for.

Anyway, I see you already solved your problem by writing a little parser instead.

Good luck!

Re: Matching fixed URI's

Posted: Tue Jan 27, 2009 7:17 pm
by alex.barylski
My hand written parser worked for simple scenarios, unfortunately I need something a little more variable in nature and doing that in PHP code is to much work, so again I return to the realm of Regex. :P

Code: Select all

#([^/]+)/([^\?|/]+)(.*)#
Everything works as expect in parsing the URI into two parts (actually three I'll explain).

The problem is the last 'group'. It's intended to match everything right of the ? or a / BUT it should be optional as should the ? or / that it matches.

So given a URI like:

Code: Select all

food/apples/
I should get a array such as this:

Code: Select all

[0] = food
[1] = apples
Whereas if I had a URI like:

Code: Select all

food/apples?name=value&name=value
I would have the following results:

Code: Select all

[0] = food
[1] = apples
[1] = name=value&name=value
WHat I get tight now is

Code: Select all

[0] = food
[1] = apples
[1] = ?name=value&name=value
Notice the preceeding ? and/or / depending on what is used. Ideally regex would ignore that and not make it part of my final grouped match.

Thanks again :)

Cheers,
Alex

Re: Matching fixed URI's

Posted: Tue Jan 27, 2009 8:03 pm
by alex.barylski
Update: I have tried the following regex:

Code: Select all

#([^/]+)/([^\?|/]+)([^\?|/]{0,}.*)#
It works but STILL includes the ? or / in the trailing group, which is what I want to ideally leave out of the final group, but the ? or / and everything after the main URI MUST be optional...

What am I doing wrong? :(