Code: Select all
[^>]+>Code: Select all
>(.+)</Moderator: General Moderators
Code: Select all
[^>]+>Code: Select all
>(.+)</The STAR and PLUS are greedy quantifiers, especially in combination with the DOT (which matches any character except new lines). As the name 'greedy' already implies: it "eats" as much as it can. So the regex ">(.+)</" means: "match a '>' followed by one or more characters of any type followed by a '</'." Now take the following string:PCSpectra wrote:When matching HTML tags and/or their content I have seen this many times:
Why does that work and something like:Code: Select all
[^>]+>
Would not work?Code: Select all
>(.+)</
Code: Select all
"text <tag1>text1</tag1> more text <tag2>text2</tag2> and more text"Code: Select all
"text <tag1[u][b]>text1</tag1> more text <tag2>text2</[/b][/u]tag2> and more text"You're welcome.PCSpectra wrote:Ahhh...OK thank you, that cleared things up. I knew it was something like that, but couldn't quite nail it. Thank you.
Code: Select all
#(.+)/(.+)\.html#Code: Select all
folder/file.htmlCode: Select all
(.+{1})/Code: Select all
(.+?)\.htmlCode: Select all
file.htmlCode: Select all
folder/file.htmlCode: Select all
$regex = "#^(.+?|[^/])\.html(.*)#";Code: Select all
#^([.+|/*]?)\.html(.*)#By placing a question mark after a greedy quantifier, you're making it "reluctant" (non-greedy). So that is correct, it will do some of the trick.PCSpectra wrote:Update: Using a ? following the .+ seems to have done *some* of the trick:
Code: Select all
(.+?)\.html
PCSpectra wrote:Should match URI's of the form:
But will also match:Code: Select all
file.html
Which is not expected.Code: Select all
folder/file.html
Code: Select all
$input = 'abc/def/ghi/jkl';
$regex1 = '#.+/.+#';
/*
The first DOT-PLUS will "eat" the entire string and will then backtrack to the
first '/' (backtracking to the first == last!). The second DOT-PLUS will then "consume"
the rest of the string, resulting in these matches:
.+ # matches 'abc/def/ghi'
/ # matches '/' (the last slash)
.+ # matches 'jkl'
*/
$regex2 = '#.+?/.+#';
/*
But now the first DOT-PLUS will "eat" the part of the string until the first
slash is encountered and the second DOT-PLUS will then "consume" the rest
of the string, resulting in these matches:
.+ # matches 'abc'
/ # matches '/' (the first slash)
.+ # matches 'def/ghi/jkl'
*/I'm not sure what exactly you're trying to match/find (only file names?), perhaps you could clarify with a couple of examples?PCSpectra wrote:How do I stop the matching when a '/' is also found, something like (which doesn't work):
Code: Select all
$regex = "#^(.+?|[^/])\.html(.*)#";
Remember that everything between [ and ] will only match one character and that the "normal" meta-characters have no special meaning inside them.PCSpectra wrote:EDIT |
I have tried something like the following as well:
Code: Select all
#^([.+|/*]?)\.html(.*)#
Code: Select all
[.+|/*]