Page 1 of 1
Matching nested div tags
Posted: Thu Jun 16, 2005 3:45 pm
by nickvd
I have a bit of markup as follows
Code: Select all
<div class="e;editor"e; id="e;newsBox1"e;>
<div class="e;newsItem"e;>
<div class="e;newsTitle"e;>
THE TORONTO TOY AUCTION
</div>
November 19, 2005 <br/>
Preview 9am Sale 10am<br/>
International Centre, <br/>
Hall 4<br/>
6900 Airport Rd, Mississauga
</div>
</div>
There are 4 of these blocks right after another (4 news boxes down the right of the page), each top level div (class=editor) has a different id (newsBox[1-4]).
I need to grab everything BETWEEN the first div and the last </div> (everything within the <div class="editor" ... > ... </div>)
the pattern that i'm using is as follows
Code: Select all
$pattern = '#<div[^>]*?\s+class="editor" id="(.*?)">(.*?)\s*</div>#is';
and right now it's working, but it's only matching up until the very first </div>, so the contents of the array after the preg_match_all is like so:
Code: Select all
Array
(
ї0] => Array
(
ї0] => <div class="e;editor"e; id="e;newsBox1"e;>
<div class="e;newsItem"e;>
<div class="e;newsTitle"e;>
THE TORONTO TOY AUCTION
</div>
)
ї1] => Array
(
ї0] => newsBox1
)
ї2] => Array
(
ї0] =>
<div class="e;newsItem"e;>
<div class="e;newsTitle"e;>
THE TORONTO TOY AUCTION
)
)
Posted: Fri Jun 17, 2005 7:05 am
by Chris Corbyn
These nesting ones are always tricky to make sure you're catching the correct closing tag.
You'll need to make it greedier by losing the "?" (by the way your first "?" isn't needed).
I'm looking at this now...
So far this works but it'll keep going until it catches the very last </div> in the document which I'm guessing is not good enough for you?
Code: Select all
<?php
$string = <<<EOD
Something is here
<div class="editor" id="newsBox1">
<div class="newsItem">
<div class="newsTitle">
THE TORONTO TOY AUCTION
</div>
November 19, 2005 <br/>
Preview 9am Sale 10am<br/>
International Centre, <br/>
Hall 4<br/>
6900 Airport Rd, Mississauga
</div>
</div>
Blah blah blah blah
Yadda yadda yadda
EOD;
preg_match('#<div class="editor"[^>]*>.*</div>#is', $string, $matches);
print_r($matches);
?>
Posted: Fri Jun 17, 2005 7:39 am
by Bennettman
You could possibly do it (messy) by running preg_replace with the example posted by d11wtg, and have it change the div tag in each case to maybe something like "div1", "div2" etc. Then you can run two more preg_replaces, one to match each "div(number)" and get the data, and another to put the divs back to normal (or a str_replace if possible).
Alternatively, use d11wtg's regexp as normal and later use substr with strpos to take out everything after the first </div> in each result.
Posted: Fri Jun 17, 2005 8:03 am
by Chris Corbyn
Bennetman is right. This will take more than one single preg (which I'm working on between jobs at work).
My method so far is to capture from <div class="editor"... up until the first </div>. preg_match_all() the opening <div> tags and count the number of outputs as X.
I'll then loop back over the string grabbing the next </div> until the loop has iterated X times to rebuild the string.
Posted: Fri Jun 17, 2005 9:33 am
by Chris Corbyn
Well it's not perfect but it works.... I'll love to see if any clever bods can find a one-regex only way to do this (I know of one but it's dependant upon knowing the number of nests in the first instance).
Code: Select all
<?php
$string = <<<EOD
Something is here
<div class="editor" id="newsBox1">
<div class="newsItem">
<div class="newsTitle">
THE TORONTO TOY AUCTION
</div>
November 19, 2005 <br/>
Preview 9am Sale 10am<br/>
International Centre, <br/>
Hall 4<br/>
6900 Airport Rd, Mississauga
</div>
</div>
Blah blah blah blah
<div yadda="foo">
Yadda yadda yadda
</div>
EOD;
preg_match('#(<div class="editor"[^>]*>.*?)</div>.*#is', $string, $matches); //Read all opening <div> tags but stop at first closing </div>.
$caught = $matches[1];
$string = substr($matches[0], strlen($caught)); //Chop off the start
preg_match_all('#<div[^>]*>#is', $caught, $matches);
$tot_nests = count($matches[0]); //No. of nests
for($x=0; $x<$tot_nests; $x++) {
preg_match('#.*?</div>#is', $string, $matches); //Find up until </div>
$appender = $matches[0];
$string = substr($string, strlen($appender)); //Chop off start of string
$caught .= $appender; //Append the next segment
}
echo $caught;
?>
Posted: Fri Jun 17, 2005 1:38 pm
by nickvd
d11, that works fine for a single block, however it fails to return the following blocks.
below is the complete file that i'm working with.
Code: Select all
<div class="e;newsItem"e;>
<div class="e;editor"e; id="e;newsBox1"e;>
<span class="e;newsTitle"e;>THE TORONTO TOY AUCTION </span>November 19, 2005 <br/>
Preview 9am Sale 10am<br/>
International Centre, <br/>
Hall 4<br/>
6900 Airport Rd, Mississauga
</div>
</div>
<div class="e;newsItem"e;>
<div class="e;editor"e; id="e;newsBox2"e;>
<span class="e;newsTitle"e;>THE TORONTO TOY &amp; DOLL COLLECTORS' SHOW</span>November 20, 2005, <br/>
10am-4pm<br/>
The International Centre, <br/>
Hall 4<br/>
6900 Airport Rd. Mississauga, ON , Canada
</div>
</div>
<div class="e;newsItem"e;>
<div class="e;editor"e; id="e;newsBox3"e;>
<span class="e;newsTitle"e;>THE TORONTO CHRISTMAS TRAIN SHOW</span>OCTOBER 29, 2005<br/>
11am - 5pm<br/>
OCTOBER 30, 2005<br/>
10am - 4pm<br/>
The International Centre, <br/>
Hall 5<br/>
6900 Airport Rd., Mississauga, ON
</div>
</div>
Posted: Fri Jun 17, 2005 1:42 pm
by Chris Corbyn
Erm... oh lol.... I thought you were exclusively looking for <div class="editor">

Hence the above code
Well that's not too tricky I guess... (you need every single block in one shot?)
Posted: Fri Jun 17, 2005 1:53 pm
by nickvd
Exactly. I'm creating an online editor app (quick and dirty, for my less than technical clients) Their website will be filled with content that is inside the <div class="editor" id="<SECTION ID>">content</div> tags. I have a system that will load the page and rip out all the blocks (just the content of the block and the value of the id attr) i then feed that data to create an instance of FCKeditor for each block on the page.
Most pages wont be a problem as I can work around it (i already have by using <span>'s instead and just using css to span {display:block;}, but that's more of a hack than I'd prefer to use) but the news box page (which is included by php on all pages.
I've made a couple of changes to your code to sorta get it to work, I'd use recursion, but i'm not that comfortable with it yet
Code: Select all
<?php
$string = file_get_contents("newsbox.html");
function getNested($string) {
//Read all opening <div> tags but stop at first closing </div>.
preg_match('#(<div class="editor"[^>]*id="([^"]+)*">.*?)</div>.*#is', $string, $matches);
$caught = $matches[1];
//echo "<pre>".print_r($matches, true)."</pre>";
$string = substr($matches[0], strlen($caught)); //Chop off the start
preg_match_all('#<div[^>]*>#is', $caught, $matches);
$tot_nests = count($matches[0]); //No. of nests
for($x=0; $x<$tot_nests; $x++) {
preg_match('#.*?</div>#is', $string, $matches); //Find up until </div>
$appender = $matches[0];
$string = substr($string, strlen($appender)); //Chop off start of string
$caught .= $appender; //Append the next segment
}
return array($caught, $string);
}
$one = getNested($string); //$one[0] will have the first editor block $one[1] will have the rest of the string.
$two = getNested($one[1]); //$two[0] will have the second block...
$three = getNested($two[1]); //...
$four = getNested($three[1]);// will return an empty array as there are only 3 blocks.
?>
Posted: Fri Jun 17, 2005 1:56 pm
by nickvd
The Final result that i'd like to see is as follows
Code: Select all
Array
(
ї0] => Array
(
їeditId] => newsBox1
їnewContent] => <span class="e;newsTitle"e;>THE TORONTO TOY AUCTION </span>November 19, 2005 <br/>
Preview 9am Sale 10am<br/>
International Centre, <br/>
Hall 4<br/>
6900 Airport Rd, Mississauga
)
ї1] => Array
(
їeditId] => newsBox2
їnewContent] => <span class="e;newsTitle"e;>THE TORONTO TOY & DOLL COLLECTORS' SHOW</span>November 20, 2005, <br/>
10am-4pm<br/>
The International Centre, <br/>
Hall 4<br/>
6900 Airport Rd. Mississauga, ON , Canada
)
ї2] => Array
(
їeditId] => newsBox3
їnewContent] => <span class="e;newsTitle"e;>THE TORONTO CHRISTMAS TRAIN SHOW</span>OCTOBER 29, 2005<br/>
11am - 5pm<br/>
OCTOBER 30, 2005<br/>
10am - 4pm<br/>
The International Centre, <br/>
Hall 5<br/>
6900 Airport Rd., Mississauga, ON
)
)
Just one function to grab the block and spit them back, as versitile as i can make it so i wont need to make any (or very few) modifications for use on other sites...
Posted: Mon Jul 25, 2005 1:36 pm
by josh
I know this is an old post but I just stumbled across it and I thought I'd post some code, this is what I use to handle nested quotes in bbcode, it could be applied to your code.
Code: Select all
// Quote
$prev_string = "";
while ($prev_string != $string) {
$prev_string = $string;
$string=preg_replace("/\[quote="(.+?)"\](.+)\[\/quote\]/", stripslashes(file_get_contents("../functions/quote.inc.php")), $string);
$string=preg_replace("/\[quote='(.+?)'\](.+)\[\/quote\]/", stripslashes(file_get_contents("../functions/quote.inc.php")), $string);
}
And functions/quote.inc.php looks like this:
Code: Select all
<table width="e;100%"e; border="e;0"e; cellspacing="e;0"e; cellpadding="e;2"e;>
<tr>
<td width="e;20"e;>&nbsp;</td>
<td><span style="e;font-size: 9px"e;>Quote:</span> <table width="e;100%"e; border="e;0"e; cellspacing="e;0"e; cellpadding="e;2"e; style="e;border-style:inset; border-color:#0000FF; border-width: 1px ; "e;>
<tr>
<td bgcolor="e;#E0E0FF"e;><table width="e;100%"e; border="e;0"e; cellspacing="e;0"e; cellpadding="e;5"e;>
<tr>
<td style="e;border-bottom-width: 1px ; border-bottom-color:#333333 "e;><strong>\\\1</strong> came out of the closet to say:</td>
</tr>
<tr>
<td>\\\2</td>
</tr>
</table></td>
</tr>
</table>
</td>
<td width="e;20"e;>&nbsp;</td>
</tr>
</table>