I have a simple enough html file (myfile.html). It looks something like this:
<html>
<head>
<meta...>
<link...>
<title>...</title>
<style>...</style>
</head>
<body>
<h>ABC</h>
<p>XYZ</P>
</body>
</html>
I read this in my php code, such as:
<?php
$file='myfile.html';
$fp=fopen($file, 'r');
$contents = fread ($fp, filesize ($file));
close ($fp);
?>
But instead of reading entire file, I only want to read the portion in html <body>..</body>. Further more, I want to parse out text in <h>...</h> vs. <p>...</p>.
Can anyone provide an example of how to do this? Thanks much.
[Solved] How do I parse out HTML body text?
Moderator: General Moderators
-
kettle_drum
- DevNet Resident
- Posts: 1150
- Joined: Sun Jul 20, 2003 9:25 pm
- Location: West Yorkshire, England
-
Illusionist
- Forum Regular
- Posts: 903
- Joined: Mon Jan 12, 2004 9:32 pm
exploding the <body> tag will do nothing but split it into 2 parts. Not very helpful. It would be better to use regular expressions. Or just use [php_man]substr()[/php_man], [php_man]strpos()[/php_man] and other string functions to parse through the file and get what you want.
I would recomend researching on regular expressions though, as theyhelp a lot!
If i get time later, i'll see if i can get some regexp's working for you.
I would recomend researching on regular expressions though, as theyhelp a lot!
If i get time later, i'll see if i can get some regexp's working for you.
thanks for the tip
explode did not work well. Neither does any singel expression. I used combination of fgets, strist and eregi, it kind of worked but still, it is not dynamic enough for me. I guess I will do some more research. Thank you for your reply.
Sounds very much as if you'd want be parsing HTML as an instance of XML.
Have a look at http://sourceforge.net/projects/php-html/
Have a look at http://sourceforge.net/projects/php-html/
http://sourceforge.net/projects/php-html/ wrote:Object oriented PHP based HTML parser. The HtmlParser class allows you to interate through HTML nodes and get their attributes, names and values. It also comes with an example class for converting HTML to formatted ASCII text.