Reg Expression Challenge - non-greedy type expression

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
hotflation
Forum Newbie
Posts: 4
Joined: Fri Feb 27, 2009 3:45 am

Reg Expression Challenge - non-greedy type expression

Post by hotflation »

I am ready to bang my head against the wall trying to figure out a solution for my needs. Any help would be greatly appreciated by a regex expert.

What I'm trying to do is match a specific set of content within some tags. Take the example below where i want to match the "content3" within the tag <href=" and STUFF_3

#
<href="content1">STUFF_1<href="content2">STUFF_2<href="content3">STUFF_3
#

I cannot for the life of me figure out a way to structure a regex that'll capture that. Anything I do goes back to the longest match and non-greedy doesn't work.

Any help please?
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Reg Expression Challenge - non-greedy type expression

Post by prometheuzz »

I get the impression you are over-simplifying your problem in your post. Could give a bit more detail about what it exactly is you're trying match? It helps if you give (a part of) the actual input you're working with and clearly indicate what it is you're trying to find.
You might also want to post what you yourself have tried so far, I (or someone else) may point the error in your logic.
hotflation
Forum Newbie
Posts: 4
Joined: Fri Feb 27, 2009 3:45 am

Re: Reg Expression Challenge - non-greedy type expression

Post by hotflation »

so what i'm matching is an encrypted string...

ex: 2USvF8CPsICBAZOTKTkvNT4tMn4J%2Fq%2BTsrUYm8cvr1%2FlaJiYl5%2BbmKkXloTf45m2jDSwZ3wNezQN7BoLJmr71YbY&oq=06oENya4ZGJbLUXW6oAQdBSLMEu2jGhZKLEO1eGqlLHuzkerb_nbSf1ybhBi1rrwx-8h0z7qng1tcFzvFmdPyrARy9tl51ZE49Lh-ItDFK230DtUl1E_so0_fPH7B7PKtkEAwKOzPzolCjM5WqTB6HLDUm2aIp7sS8s__esUaQ,YT0z

this is any variable content and bound to change.

what i've tried so far:

/((href=\")(.*))(\">$keyword<\/a>)/

where $keyword is STUFF_1, STUFF_2, etc...

i've tried non-greedy (.+?), and a bunch of other random things but nothing has worked. the expression above matches the whole string
<href="content1">STUFF_1<href="content2">STUFF_2<href="content3">STUFF_3

if $keyword is STUFF_3. it'll match <href="content1">STUFF_1<href="content2">STUFF_2 if $keyword is STUFF_2, etc. i need to only match the closet content.

i hope that helps.
hotflation
Forum Newbie
Posts: 4
Joined: Fri Feb 27, 2009 3:45 am

Re: Reg Expression Challenge - non-greedy type expression

Post by hotflation »

Here's a sample text:

<tr><td><h3><a id="keyword" href="http://hotflation/index.php?Query=2USvF ... YT0z">Demo Girl 1</a></h3></td></tr><tr><td><h3><a id="keyword" href="http://hotflation/index.php?Query=2USvF ... YT0z">Demo Dude 2</a></h3>

I want to capture/match the content from the query in the 2nd term "demo dude 2" into a variable
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Reg Expression Challenge - non-greedy type expression

Post by prometheuzz »

Try this:

Code: Select all

"#href=\"([^\"]+)\"\s*>\s*$keyword\s*</a>#i"
hotflation
Forum Newbie
Posts: 4
Joined: Fri Feb 27, 2009 3:45 am

Re: Reg Expression Challenge - non-greedy type expression

Post by hotflation »

Thanks!! you are the man! it works perfectly...could you enlighten me and explain the expression?

I appreciate the help very much.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Reg Expression Challenge - non-greedy type expression

Post by prometheuzz »

hotflation wrote:Thanks!! you are the man! it works perfectly...could you enlighten me and explain the expression?

I appreciate the help very much.
No problem.

Code: Select all

href=          // match 'href='
\"             // match a double quote
(              // start group 1
  [^\"]+       //   match one or more characters other than a double quote
)              // end group 1
\"             // match a double quote
\s*            // match zero or more white space characters (also new line chars!)
>              // match '>'
\s*            // match zero or more white space characters
$keyword       // match the contents of your keyword
\s*            // match zero or more white space characters
</a>           // match '</a>'
Post Reply