Page 1 of 1

regexp: pattern in pattern

Posted: Fri Jan 23, 2009 9:38 am
by funfeltp
Hi,
I'd like to write a regexp which solves this kond of problem:

basic text=" aaa [[aaa bbb [[ccc ddd]] eee ]] bbb "

regxp=?

as a result I'd like to get : [[aaa bbb [[ccc ddd]] eee ]]

what I am able to get is only: [[aaa bbb [[ccc ddd]]

I'm using python, so I't would be great if it worked in python aswell.

thanks in advance

Re: regexp: pattern in pattern

Posted: Fri Jan 23, 2009 9:54 am
by Apollo

Code: Select all

$s = ' aaa [[aaa bbb [[ccc ddd]] eee ]] bbb ';
$s = preg_replace('/^[^[]*(\[.*\])[^\]]*$/','\\1',$s);
// $s is now what you want
The idea here (there are other ways) is: cut off any non "[" chars until the first [, and any non "]" chars after the last ]

Re: regexp: pattern in pattern

Posted: Fri Jan 23, 2009 9:57 am
by prometheuzz
Can there be more nested brackets? Like this:

Code: Select all

... [[ ... [[ ... [[ ... ]] ... ]] ... ]] ...
If so, then no, there is no regex solution for this: regex is not meant to "count" or being able to create recursive patterns. At least, last I checked, Python is not able to do this. You'd better ask at a Python specific forum/mailing list to be really sure.

Good luck.

Re: regexp: pattern in pattern

Posted: Fri Jan 23, 2009 10:04 am
by Apollo
prometheuzz wrote:If so, then no, there is no regex solution for this
My solution above works just fine on such strings :)

Unless I misunderstood TS's problem, I assumed the regexp was supposed to cut off anything until the first [ and beyond the last ], right?

Re: regexp: pattern in pattern

Posted: Fri Jan 23, 2009 10:08 am
by mintedjo
If so, then no, there is no regex solution for this
Backus naur form ftw!
The problem looks like something you would have to solve using a recursive definition.
I don't recall how to do any of this stuff but I remember something about lexx and yacc from writing language parsers at university.

Re: regexp: pattern in pattern

Posted: Fri Jan 23, 2009 10:16 am
by prometheuzz
Apollo wrote:
prometheuzz wrote:If so, then no, there is no regex solution for this
My solution above works just fine on such strings :)
If only one such a string exists, the yes. But I presumed (and still do) that the OP over simplified his/her problem and that s/he can have string like these:

Code: Select all

'aaa [[bbb [[ ccc ]] ]] ddd [[ eee fff ]] ggg'
where

Code: Select all

'[[bbb [[ ccc ]] ]]'
and

Code: Select all

'[[ eee fff ]]'
are the sub strings the OP is interested in. And if there can be more than 2 nested tags then regex is definately not the way to go (especially not with Python).

Re: regexp: pattern in pattern

Posted: Fri Jan 23, 2009 10:18 am
by funfeltp
prometheuzz wrote:
If only one such a string exists, the yes. But I presumed (and still do) that the OP over simplified his/her problem and that s/he can have string like these:

Code: Select all

'aaa [[bbb [[ ccc ]] ]] ddd [[ eee fff ]] ggg'
where

Code: Select all

'[[bbb [[ ccc ]] ]]'
and

Code: Select all

'[[ eee fff ]]'
are the sub strings the OP is interested in. And if there can be more than 2 nested tags then regex is definately not the way to go (especially not with Python).
Yes, unfortunatelly that kind of srtings are also possible.
In string like:

Code: Select all

'aaa [[bbb [[ ccc ]] ]] ddd [[ eee fff ]] ggg

I need to find

Code: Select all

'[[bbb [[ ccc ]] ]]'
and

Code: Select all

'[[ eee fff ]]'

as you've written
Any solution?

Re: regexp: pattern in pattern

Posted: Fri Jan 23, 2009 10:27 am
by prometheuzz
funfeltp wrote:...
Any solution?
I believe I already answered that question (more than once) ; )

Re: regexp: pattern in pattern

Posted: Fri Jan 23, 2009 10:28 am
by prometheuzz
mintedjo wrote:
If so, then no, there is no regex solution for this
Backus naur form ftw!
The problem looks like something you would have to solve using a recursive definition.
That is correct (see my first reply as well).
mintedjo wrote:I don't recall how to do any of this stuff but I remember something about lexx and yacc from writing language parsers at university.
PHP's regex engine can cope with recursively nested tags, but it is a pain in the @ss to get your head around the concept. Writing a little grammar and then generating a lexer+parser with tools as Lexx, Yacc, ANTLR, etc. would be "the way" to go.
Anything but regex! ; )

Re: regexp: pattern in pattern

Posted: Fri Jan 23, 2009 10:32 am
by funfeltp
prometheuzz wrote:[
I believe I already answered that question (more than once) ; )
OK I get it : )
I see that I'm gonna have to do something else with that sh..t.
Thank's anyway.