html pattern matching using preg_match_all()
Posted: Fri Jun 15, 2007 3:46 am
Bold text is to be extracted ie one which is appears within the [ b ] and [ /b ]
i tried this but cant get all the data
expected output is
array[1] as {
Tumbling Gnomes,
Mini Roller Ball Game,
Princess Sophie Cut Out Book
}
array[2] as {
Jolly somersaulting gnomes that will delight all ages. Will somersault down any non-slippery slope. For 3 years and up.100% wool felt. Various colours.,
Roll the ball down the chute and try to land in the winning ring! Comes in a tin.,
blank (as no content in last td)
}
array[3] as {
10cm tall,
Tin 7 x 5cm,
blank (as no content in last td)
}
and
array[4] as {
£1.95 each,
£4.50,
£3.90
}
how can i set this (<em>(.*?) </em>)? as optional as this may appear or may not appear in all the cases (if appear they extract the text between the tags )
<em>Tin 7 x 5cm</em>
me not strong in regex ..
Kindly help with this ... thanks in advance
Code: Select all
<td width="33%" valign="top">
<h2 class="item">[b]Tumbling Gnomes[/b]</h2>
<br />
[b]Jolly somersaulting gnomes that will delight all ages. Will somersault down any non-slippery slope. For 3 years and up.[/b]
<br />(this br tag can be exceptional .ie may be or may not be there)
[b]100% wool felt. Various colours.[/b]
<br />(this br tag can be exceptional ie may be or may not be there)
<em>[b]10cm tall[/b]</em>(those pair of em tags can be exceptional ... if present scrape the text between them)
<b>[b]£1.95 each[/b]</b>(those pair of b tags can be exceptional ... if present scrape the text between them)
</td>
Look at those below given td to have more idea of how they can be
<td width="33%" valign="top">
<h2 class="item">[b]Mini Roller Ball Game[/b]</h2>
<br />
[b]Roll the ball down the chute and try to land in the winning ring! Comes in a tin.[/b]
<em>[b]Tin 7 x 5cm[/b]</em>
<b>[b]£4.50[/b]</b>
</td>
<td width="33%" valign="top">
<h2 class="item">[b]Princess Sophie Cut Out Book[/b]</h2>
<br />
<b>[b]£3.90[/b]</b>
</td>Code: Select all
preg_match_all("|<h2 class=\"item\">(.*?)</h2>
<br />(.*?)(?:<em>(.*?)</em>)?
<b>(.*?)</b>|si",$file,$t);expected output is
array[1] as {
Tumbling Gnomes,
Mini Roller Ball Game,
Princess Sophie Cut Out Book
}
array[2] as {
Jolly somersaulting gnomes that will delight all ages. Will somersault down any non-slippery slope. For 3 years and up.100% wool felt. Various colours.,
Roll the ball down the chute and try to land in the winning ring! Comes in a tin.,
blank (as no content in last td)
}
array[3] as {
10cm tall,
Tin 7 x 5cm,
blank (as no content in last td)
}
and
array[4] as {
£1.95 each,
£4.50,
£3.90
}
how can i set this (<em>(.*?) </em>)? as optional as this may appear or may not appear in all the cases (if appear they extract the text between the tags )
<em>Tin 7 x 5cm</em>
me not strong in regex ..
Kindly help with this ... thanks in advance