Page 1 of 1

html pattern matching using preg_match_all()

Posted: Fri Jun 15, 2007 3:46 am
by trimbak
Bold text is to be extracted ie one which is appears within the [ b ] and [ /b ]


Code: Select all

<td width="33%" valign="top">
<h2 class="item">[b]Tumbling Gnomes[/b]</h2>
<br />
[b]Jolly somersaulting gnomes that will delight all ages. Will somersault down any non-slippery slope. For 3 years and up.[/b]
<br />(this br tag can be exceptional .ie may be or may not be there)
[b]100% wool felt. Various colours.[/b]
<br />(this br tag can be exceptional ie may be or may not be there)
<em>[b]10cm tall[/b]</em>(those pair of em tags can be exceptional ... if present scrape the text between them)
&nbsp;<b>[b]&pound;1.95 each[/b]</b>(those pair of b tags can be exceptional ... if present scrape the text between them)
</td>

 Look at those below given td to have more idea of how they can be

<td width="33%" valign="top">
<h2 class="item">[b]Mini Roller Ball Game[/b]</h2>
<br />
[b]Roll the ball down the chute and try to land in the winning ring! Comes in a tin.[/b]
<em>[b]Tin 7 x 5cm[/b]</em>
&nbsp;<b>[b]&pound;4.50[/b]</b>
</td>

<td width="33%" valign="top">
<h2 class="item">[b]Princess Sophie Cut Out Book[/b]</h2>
<br />
<b>[b]&pound;3.90[/b]</b>
</td>
i tried this but cant get all the data

Code: Select all

preg_match_all("|<h2 class=\"item\">(.*?)</h2>
<br />(.*?)(?:<em>(.*?)</em>)?
&nbsp;<b>(.*?)</b>|si",$file,$t);

expected output is
array[1] as {

Tumbling Gnomes,

Mini Roller Ball Game,

Princess Sophie Cut Out Book

}

array[2] as {
Jolly somersaulting gnomes that will delight all ages. Will somersault down any non-slippery slope. For 3 years and up.100% wool felt. Various colours.,

Roll the ball down the chute and try to land in the winning ring! Comes in a tin.,

blank (as no content in last td)

}

array[3] as {

10cm tall,

Tin 7 x 5cm,

blank (as no content in last td)

}

and

array[4] as {

&pound;1.95 each,

&pound;4.50,

&pound;3.90

}


how can i set this (<em>(.*?) </em>)? as optional as this may appear or may not appear in all the cases (if appear they extract the text between the tags )
<em>Tin 7 x 5cm</em>


me not strong in regex ..

Kindly help with this ... thanks in advance

    Posted: Fri Jun 15, 2007 9:40 am
    by GeertDD
    To get you started this regex returns all the titles (<h2>).

    Code: Select all

    #<h2(?:.*?)>\[b\](.*?)\[/b\]</h2>#i