<div class="e;editor"e; id="e;newsBox1"e;>
<div class="e;newsItem"e;>
<div class="e;newsTitle"e;>
THE TORONTO TOY AUCTION
</div>
November 19, 2005 <br/>
Preview 9am Sale 10am<br/>
International Centre, <br/>
Hall 4<br/>
6900 Airport Rd, Mississauga
</div>
</div>
There are 4 of these blocks right after another (4 news boxes down the right of the page), each top level div (class=editor) has a different id (newsBox[1-4]).
I need to grab everything BETWEEN the first div and the last </div> (everything within the <div class="editor" ... > ... </div>)
<?php
$string = <<<EOD
Something is here
<div class="editor" id="newsBox1">
<div class="newsItem">
<div class="newsTitle">
THE TORONTO TOY AUCTION
</div>
November 19, 2005 <br/>
Preview 9am Sale 10am<br/>
International Centre, <br/>
Hall 4<br/>
6900 Airport Rd, Mississauga
</div>
</div>
Blah blah blah blah
Yadda yadda yadda
EOD;
preg_match('#<div class="editor"[^>]*>.*</div>#is', $string, $matches);
print_r($matches);
?>
You could possibly do it (messy) by running preg_replace with the example posted by d11wtg, and have it change the div tag in each case to maybe something like "div1", "div2" etc. Then you can run two more preg_replaces, one to match each "div(number)" and get the data, and another to put the divs back to normal (or a str_replace if possible).
Alternatively, use d11wtg's regexp as normal and later use substr with strpos to take out everything after the first </div> in each result.
Bennetman is right. This will take more than one single preg (which I'm working on between jobs at work).
My method so far is to capture from <div class="editor"... up until the first </div>. preg_match_all() the opening <div> tags and count the number of outputs as X.
I'll then loop back over the string grabbing the next </div> until the loop has iterated X times to rebuild the string.
Well it's not perfect but it works.... I'll love to see if any clever bods can find a one-regex only way to do this (I know of one but it's dependant upon knowing the number of nests in the first instance).
<?php
$string = <<<EOD
Something is here
<div class="editor" id="newsBox1">
<div class="newsItem">
<div class="newsTitle">
THE TORONTO TOY AUCTION
</div>
November 19, 2005 <br/>
Preview 9am Sale 10am<br/>
International Centre, <br/>
Hall 4<br/>
6900 Airport Rd, Mississauga
</div>
</div>
Blah blah blah blah
<div yadda="foo">
Yadda yadda yadda
</div>
EOD;
preg_match('#(<div class="editor"[^>]*>.*?)</div>.*#is', $string, $matches); //Read all opening <div> tags but stop at first closing </div>.
$caught = $matches[1];
$string = substr($matches[0], strlen($caught)); //Chop off the start
preg_match_all('#<div[^>]*>#is', $caught, $matches);
$tot_nests = count($matches[0]); //No. of nests
for($x=0; $x<$tot_nests; $x++) {
preg_match('#.*?</div>#is', $string, $matches); //Find up until </div>
$appender = $matches[0];
$string = substr($string, strlen($appender)); //Chop off start of string
$caught .= $appender; //Append the next segment
}
echo $caught;
?>
<div class="e;newsItem"e;>
<div class="e;editor"e; id="e;newsBox1"e;>
<span class="e;newsTitle"e;>THE TORONTO TOY AUCTION </span>November 19, 2005 <br/>
Preview 9am Sale 10am<br/>
International Centre, <br/>
Hall 4<br/>
6900 Airport Rd, Mississauga
</div>
</div>
<div class="e;newsItem"e;>
<div class="e;editor"e; id="e;newsBox2"e;>
<span class="e;newsTitle"e;>THE TORONTO TOY &amp; DOLL COLLECTORS' SHOW</span>November 20, 2005, <br/>
10am-4pm<br/>
The International Centre, <br/>
Hall 4<br/>
6900 Airport Rd. Mississauga, ON , Canada
</div>
</div>
<div class="e;newsItem"e;>
<div class="e;editor"e; id="e;newsBox3"e;>
<span class="e;newsTitle"e;>THE TORONTO CHRISTMAS TRAIN SHOW</span>OCTOBER 29, 2005<br/>
11am - 5pm<br/>
OCTOBER 30, 2005<br/>
10am - 4pm<br/>
The International Centre, <br/>
Hall 5<br/>
6900 Airport Rd., Mississauga, ON
</div>
</div>
Exactly. I'm creating an online editor app (quick and dirty, for my less than technical clients) Their website will be filled with content that is inside the <div class="editor" id="<SECTION ID>">content</div> tags. I have a system that will load the page and rip out all the blocks (just the content of the block and the value of the id attr) i then feed that data to create an instance of FCKeditor for each block on the page.
Most pages wont be a problem as I can work around it (i already have by using <span>'s instead and just using css to span {display:block;}, but that's more of a hack than I'd prefer to use) but the news box page (which is included by php on all pages.
I've made a couple of changes to your code to sorta get it to work, I'd use recursion, but i'm not that comfortable with it yet
<?php
$string = file_get_contents("newsbox.html");
function getNested($string) {
//Read all opening <div> tags but stop at first closing </div>.
preg_match('#(<div class="editor"[^>]*id="([^"]+)*">.*?)</div>.*#is', $string, $matches);
$caught = $matches[1];
//echo "<pre>".print_r($matches, true)."</pre>";
$string = substr($matches[0], strlen($caught)); //Chop off the start
preg_match_all('#<div[^>]*>#is', $caught, $matches);
$tot_nests = count($matches[0]); //No. of nests
for($x=0; $x<$tot_nests; $x++) {
preg_match('#.*?</div>#is', $string, $matches); //Find up until </div>
$appender = $matches[0];
$string = substr($string, strlen($appender)); //Chop off start of string
$caught .= $appender; //Append the next segment
}
return array($caught, $string);
}
$one = getNested($string); //$one[0] will have the first editor block $one[1] will have the rest of the string.
$two = getNested($one[1]); //$two[0] will have the second block...
$three = getNested($two[1]); //...
$four = getNested($three[1]);// will return an empty array as there are only 3 blocks.
?>
Array
(
ї0] => Array
(
їeditId] => newsBox1
їnewContent] => <span class="e;newsTitle"e;>THE TORONTO TOY AUCTION </span>November 19, 2005 <br/>
Preview 9am Sale 10am<br/>
International Centre, <br/>
Hall 4<br/>
6900 Airport Rd, Mississauga
)
ї1] => Array
(
їeditId] => newsBox2
їnewContent] => <span class="e;newsTitle"e;>THE TORONTO TOY & DOLL COLLECTORS' SHOW</span>November 20, 2005, <br/>
10am-4pm<br/>
The International Centre, <br/>
Hall 4<br/>
6900 Airport Rd. Mississauga, ON , Canada
)
ї2] => Array
(
їeditId] => newsBox3
їnewContent] => <span class="e;newsTitle"e;>THE TORONTO CHRISTMAS TRAIN SHOW</span>OCTOBER 29, 2005<br/>
11am - 5pm<br/>
OCTOBER 30, 2005<br/>
10am - 4pm<br/>
The International Centre, <br/>
Hall 5<br/>
6900 Airport Rd., Mississauga, ON
)
)
Just one function to grab the block and spit them back, as versitile as i can make it so i wont need to make any (or very few) modifications for use on other sites...
I know this is an old post but I just stumbled across it and I thought I'd post some code, this is what I use to handle nested quotes in bbcode, it could be applied to your code.