Grabbing text between tags

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

Can you post the unmodified contents of $string? (perhaps within code tags to retain any formatting).
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

this is what is contained in $string

This is all 1 line

Code: Select all

ext="#000000" leftmargin="0" topmargin="0" rightmargin="0" bottommargin="0" marginwidth="0" marginheight="0"> <center> <?php print "it works"; ?> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <tr valign="top"> <td rowspan="2" width="65"><img src="logo.gif" width="65" height="52"></td> <td align="center"> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <tr> <td background="head_back.gif"><img src="TreeBlank.gif" width="45" height="45"></td> <td width="100%" align="center" valign="middle" background="head_back.gif" nowrap> <span class="ReportTitle">Report for ipd-Eurochem: </span> <span class="CategoryTitle">General Statistics</span> </td> <td width="112"><a href="http://www.weblogexpert.com/" target="_blank"><img src="powered.gif" width="112" height="45" border="0"></a></td> </tr> </table> </td> </tr> <tr> <td> <table width="100%" border="0" cellspacing="0" cellpadding="0" height="7"> <tr><td background="top_line.gif"></td></tr> </table> </td> </tr> </table> <table width="90%" border=0 cellpadding=1 cellspacing=1> <tr> <td valign="top" align="left">Time range: 13/05/2004 09:31:44 - 20/05/2004 21:03:29</td> <td valign="top" align="right">Generated on Wed Apr 21, 2004 - 10:38:15</td> </tr> </table> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <tr><td height="7"></td></tr> <tr><td height="7" background="top_line.gif"></td></tr> </table> <br> <a name="Summary"></a> <table cellpadding="0" border="0" cellspacing="0" width="90%"> <tr> <td width="10"><img src="section_left.gif" width="10" height="20" border="0"></td> <td class="SectionTitle" nowrap>Summary</td> <td width="10"><img src="section_right.gif" width="10" height="20" border="0"></td> </tr> </table> <p></p> <span class="TableTitle">Summary</span><br> <table border=0 cellspacing=0 cellpadding=0 height=6><tr><td></td></tr></table> <table border=0 bgcolor="#000000" cellspacing=0 cellpadding=0 width="90%"> <tr> <td> <table border=0 cellspacing=1 cellpadding=2 width="100%"> <tr class="TableSolidRow"> <td colspan="2" class="TableCell">Hits</td> </tr> <tr class="TableRow1"> <td width="100%" class="TableCell">Total Hits</td> <td width="0%" class="TableCell">3,304</td> </tr> <tr class="TableRow2"> <td class="TableCell">Average Hits per Day</td> <td class="TableCell">413</td> </tr> <tr class="TableRow1"> <td class="TableCell">Average Hits per Visitor</td> <td class="TableCell">34.42</td> </tr> <tr class="TableRow2"> <td class="TableCell">Cached Requests</td> <td class="TableCell">517</td> </tr> <tr class="TableRow1"> <td class="TableCell">Failed Requests</td> <td class="TableCell">0</td> </tr> <tr class="TableSolidRow"> <td colspan="2" class="TableCell">Page Views</td> </tr> <tr class="TableRow1"> <td class="TableCell">Total Page Views</td> <td class="TableCell">60</td> </tr> <tr class="TableRow2"> <td class="TableCell">Average Page Views per Day</td> <td class="TableCell">7</td> </tr> <tr class="TableRow1"> <td class="TableCell">Average Page Views per Visitor</td> <td class="TableCell">0.63</td> </tr> <tr class="TableSolidRow"> <td colspan="2" class="TableCell">Visitors</td> </tr> <tr class="TableRow1"> <td class="TableCell">Total Visitors</td> <td class="TableCell">96</td> </tr> <tr class="TableRow2"> <td class="TableCell">Average Visitors per Day</td> <td class="TableCell">12</td> </tr> <tr class="TableRow1"> <td class="TableCell">Total Unique IPs</td> <td class="TableCell">85</td> </tr> <tr class="TableSolidRow"> <td colspan="2" class="TableCell">Bandwidth</td> </tr> <tr class="TableRow1"> <td class="TableCell">Total Bandwidth</td> <td class="TableCell">5.56&nbsp;MB</td> </tr> <tr class="TableRow2"> <td class="TableCell">Average Bandwidth per Day</td> <td class="TableCell">711.23&nbsp;KB</td> </tr> <tr class="TableRow1"> <td class="TableCell">Average Bandwidth per Hit</td> <td class="TableCell">1.72&nbsp;KB</td> </tr> <tr class="TableRow2"> <td class="TableCell">Average Bandwidth per Visitor</td> <td class="TableCell">59.27&nbsp;KB</td> </tr> </table> </td> </tr> </table> <p> <br> <p>&nbsp</p> </center> </body> </html>
Mark
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

Bech100 wrote: This is all 1 line
Christ!

Try this...

Code: Select all

if (preg_match('/<td>\s+(<table b.*<\/table>)\s+<\/td>/is', $string, $matches))
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

gimme a sec, ill try that, you can answer this in the mean time

What is the "\x0a" in this line for

Code: Select all

echo $matches[1] . "\x0a";
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

Is there a reason why that string starts with 'ext="#000000"' and doesn't seem to conatin the whole HTML file?

\x0a is actually just the same as \n I just use it to return a new line after the HTML code incase you want to add anything else after it. It makes for better formatting of the source so everything is not on one line.
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

redmonkey wrote:Is there a reason why that string starts with 'ext="#000000"' and doesn't seem to conatin the whole HTML file?
I think i just missed a bit whilst copying and pasting.

So, what changed to make the new preg_match patern work?

Mark
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

Bech100 wrote: So, what changed to make the new preg_match patern work?
It take it it works OK then?

The main differences are that the original source you posted seemed to be over multiple lines, therefore my original regex would not find anything as it was looking for a specific piece of the string at the start of a line.

Also, the original source you had did not have any spaces between the <td> and <table> tags but your one liner source did.
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

ah, i c.

Thanks a lot buddy.

I have got a load of other files i need parsing, so working from your example, hopefully i will start to learn regex in more detail.

You know what, i think you deserver another

Image

Mark
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

LOL two in one thread! Thanks.

For what it's worth, I do alot of file analysis with regex and I find it better to start with quite restrictive/specific regex patterns (.* comes in handy but many people overuse it and pull in all sorts of unwanted stuff. Also using ^ and $ can help alot in narrowing down your search.) and then relax the pattern if/as needed.

If you have any further questions/difficulties, feel free to ask (either in a thread or PM).
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

redmonkey wrote:If you have any further questions/difficulties, feel free to ask (either in a thread or PM).
Cheerz, i may just take you up on that. I'll have a bash myself, but WHEN i get stuck, i'll holler at ya :)

Mark
Post Reply