Page 1 of 1

regexp problem - strip everything except y between x and z

Posted: Wed Aug 20, 2003 4:11 am
by dewaard
Guys,
I'm stuck at a nasty regexp problem:

Code: Select all

//strip anything between 2 blocks
//<TML> dewaard: '@\&#1111;/block\].*?\&#1111;block=@is'
$input = preg_replace("/\&#1111;\/block\].*?\&#1111;block=/is","&#1111;/block]&#1111;block=", $input);
This code is ment to newlines and other nasty stuff between two [block] tags, but should remove the [break-blocks] tag. So I want to remove anything except [break-blocks] between [/block] and [block=...

Any suggestions appreciated.

Posted: Wed Aug 20, 2003 4:20 am
by greenhorn666
A sample of a "raw" input would be great...
- Not quite sure here, but shouldn't you loop thru all lines?
- Is there something before and after the [block]?

Code: Select all

$input = ereg_replace(".*\[block\](.*)\[\/block\].*", "", $input)
But I'm not sure I got your point

Posted: Wed Aug 20, 2003 4:28 am
by greenhorn666
DUH!
Noooo... that's not it... :P
Could you give me a before and a wished after transformation please?
It should be something like

Code: Select all

$input = ereg_replace("(.*\[block\]).*(\(\/block\].*)", "\\1\\2", $input);
That would transform
I love my [block] green [/block] fish
into
I love my green [block] [/block] fish

Is that what you need?

Posted: Wed Aug 20, 2003 4:28 am
by dewaard
Thanks for your quick reply...

Code: Select all

&#1111;block=10%]16-08-03&#1111;/block]
&#1111;block=15%]Willem II - AZ :-)&#1111;/block]
&#1111;block=5%]1-0&#1111;/block]
&#1111;block=5%]&#1111;Details]&#1111;/block]&#1111;break-blocks] <- this one is getting stripped but is essential
&#1111;block=10%]23-08-03&#1111;/block]
&#1111;block=15%]PSV - Willem II&#1111;/block]&#1111;break-blocks]
This my UBB like code to create divs that are floating next to each other and [break-blocks] adds a div w/ 'clear: both'. The problem is that any newlines/rubbish between [/block] and [block= messes up the layout so I need to remove that. The first few blocks are displayed well and break perfectly but the second [break-blocks] is stripped by the regexp is posted and thus it doesn't break the previous blocks...

Example: http://www.linuxaddict.nl/cms/index.php ... a0304&id=6

So I have to strip everything except the '[break-blocks]' tag. Stripping everyting worked, but I don't know how to keep the [break-blocks]' tag intact....

Posted: Wed Aug 20, 2003 4:43 am
by will
this will simply remove any newline characters between '[/block]' and '[block=' while leaving everythign else intact.

Code: Select all

$newstr = preg_replace("/\&#1111;\/block\](&#1111;^\n]*)\n\&#1111;block=/is","&#1111;/block]\\1&#1111;block=", $str);
or use this to remove everything between those two tags except [break-blocks]

Code: Select all

$newstr = preg_replace("/\[\/block\].*(\[break-blocks\])?.*\[block=/iUs","[/block]\\1[block=", $str);

these two will do the same thing in the example you provided, but will not behave the same in all cases. if you'd like more clarification on that (not to insult your intelligence, just not sure how much you know about regexes), i can explain more.
?>

Posted: Wed Aug 20, 2003 4:52 am
by greenhorn666
You regexp just leaving a (if any) [break-blocks] between blocks

Code: Select all

preg_replace("/\[\/block\].*?(\[break-blocks\])?.*?\[block=/is","[/block]\\1[block=", $input);

Posted: Wed Aug 20, 2003 5:00 am
by will
greenhorn666 wrote:You regexp just leaving a (if any) [break-blocks] between blocks

Code: Select all

preg_replace("/\[\/block\].*?(\[break-blocks\])?.*?\[block=/is","[/block]\\1[block=", $input);
you don't need the question mark after ".*" since it means zero or more... you will also need the U modifier to make preg_repalce "ungreedy" (someone discusses it in the user comments in the manual page for preg_replace)

Posted: Wed Aug 20, 2003 5:08 am
by greenhorn666
I thought so too,
But I copy the one dewaard pasted in his post... :P
I use posix anyhow, wasn't sure about perl's extensions ;)

Posted: Wed Aug 20, 2003 5:13 am
by dewaard
great, it's working now. Thanks guys.

Code: Select all

<?php
$input = preg_replace("/\[\/block\].*?(\[break-blocks\])?.*?\[block=/is","[/block]\\1[block=", $input);
?>
Is this the proper/most efficient way to do it? At least it works, which is a relieve :)

Posted: Wed Aug 20, 2003 5:29 am
by will
dewaard wrote:great, it's working now. Thanks guys.

Code: Select all

<?php
$input = preg_replace("/\[\/block\].*?(\[break-blocks\])?.*?\[block=/is","[/block]\\1[block=", $input);
?>
Is this the proper/most efficient way to do it? At least it works, which is a relieve :)
yep, except that you don't really need the extra question-marks (although they don't really hurt anything either)

Posted: Wed Aug 20, 2003 2:59 pm
by m3rajk
with perl you can change delimiters to make it easier. use % in place of / and you don;t need to escape /

also, you don't need to escape ]

Posted: Thu Aug 21, 2003 2:13 am
by dewaard
You can also do that w/ PHP:

Code: Select all

//strip anything between 2 blocks
$input = preg_replace("@\[\/block\].*?(\[break-blocks\])?.*?\[block=@is","[/block]\\1[block=", $input);
Notice the @? EDIT: well, not exactly 'that', but it makes it easier anyway.

I wasn't able to remove any question marks, this didn't work. Which :?: can be removed?

Posted: Thu Aug 21, 2003 2:28 pm
by m3rajk
dewaard wrote:You can also do that w/ PHP:

Code: Select all

//strip anything between 2 blocks
$input = preg_replace("@\[\/block\].*?(\[break-blocks\])?.*?\[block=@is","[/block]\\1[block=", $input);
Notice the @? EDIT: well, not exactly 'that', but it makes it easier anyway.

I wasn't able to remove any question marks, this didn't work. Which :?: can be removed?
you would have to escape ? in the string... ie: use \?

Posted: Fri Aug 22, 2003 2:25 am
by will
dewaard wrote:You can also do that w/ PHP:

Code: Select all

//strip anything between 2 blocks
$input = preg_replace("@\[\/block\].*?(\[break-blocks\])?.*?\[block=@is","[/block]\\1[block=", $input);
Notice the @? EDIT: well, not exactly 'that', but it makes it easier anyway.

I wasn't able to remove any question marks, this didn't work. Which :?: can be removed?

the first and third one are not needed, unless you intend for them to match a literal '?', which i doubt (if that is the case however, they need to be escaped as the the previous post explains). a question mark matches zero or one of the preceeding blocks, whether it is a single character or parenthetical set. therefore the middle qmark is needed because the [break-blocks] text may or may not be present.

an asterick (*) matches zero or more instances of the preceding block. since it allows for zero instances, the question mark is not needed. you may be thinking of a plus sign (+) which matches one or more instances... in which case you would need the qmark (although i'm still not sure if that would actually work... never tried).