[Solved] How do I parse out HTML body text?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
chinagirl
Forum Newbie
Posts: 6
Joined: Wed Mar 31, 2004 10:43 am

[Solved] How do I parse out HTML body text?

Post by chinagirl »

I have a simple enough html file (myfile.html). It looks something like this:
<html>
<head>
<meta...>
<link...>
<title>...</title>
<style>...</style>
</head>
<body>
<h>ABC</h>
<p>XYZ</P>
</body>
</html>

I read this in my php code, such as:
<?php
$file='myfile.html';
$fp=fopen($file, 'r');
$contents = fread ($fp, filesize ($file));
close ($fp);
?>

But instead of reading entire file, I only want to read the portion in html <body>..</body>. Further more, I want to parse out text in <h>...</h> vs. <p>...</p>.

Can anyone provide an example of how to do this? Thanks much.
kettle_drum
DevNet Resident
Posts: 1150
Joined: Sun Jul 20, 2003 9:25 pm
Location: West Yorkshire, England

Post by kettle_drum »

Just read the whole file, and then keep on parsing it. Say explode('<body'>, $file); or something and then parcing it until you have what you want.
Illusionist
Forum Regular
Posts: 903
Joined: Mon Jan 12, 2004 9:32 pm

Post by Illusionist »

exploding the <body> tag will do nothing but split it into 2 parts. Not very helpful. It would be better to use regular expressions. Or just use [php_man]substr()[/php_man], [php_man]strpos()[/php_man] and other string functions to parse through the file and get what you want.

I would recomend researching on regular expressions though, as theyhelp a lot!
If i get time later, i'll see if i can get some regexp's working for you.
chinagirl
Forum Newbie
Posts: 6
Joined: Wed Mar 31, 2004 10:43 am

thanks for the tip

Post by chinagirl »

explode did not work well. Neither does any singel expression. I used combination of fgets, strist and eregi, it kind of worked but still, it is not dynamic enough for me. I guess I will do some more research. Thank you for your reply.
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

Sounds very much as if you'd want be parsing HTML as an instance of XML.

Have a look at http://sourceforge.net/projects/php-html/
http://sourceforge.net/projects/php-html/ wrote:Object oriented PHP based HTML parser. The HtmlParser class allows you to interate through HTML nodes and get their attributes, names and values. It also comes with an example class for converting HTML to formatted ASCII text.
chinagirl
Forum Newbie
Posts: 6
Joined: Wed Mar 31, 2004 10:43 am

Post by chinagirl »

That parser worked. Thanks Patrick.
Post Reply