Parsing HTML Page to database
Posted: Thu Jan 04, 2007 10:14 pm
Okay this is the beginning of a project im working on, the goal is to take all the data stored in the following html page:
http://www.cryosphere.f2s.com/Freelancer/example.html
(thats just a demo my actual page has nearly 900 entries)
and put all that data into a database, im just at the beginning and already having trouble, and cannot figure out what
first im trying to parse the html file to grab the info i want, using loadHTMLFile() I created the following script from the example
test.php
ex2.html
the origional example of
works fantastic, this is simple i know it, but it wont wory any way i try it all i get with my example is a blank page, but its the same information.... im sooo confused, apparently i cant do half of what i thought i could
http://www.cryosphere.f2s.com/Freelancer/example.html
(thats just a demo my actual page has nearly 900 entries)
and put all that data into a database, im just at the beginning and already having trouble, and cannot figure out what
first im trying to parse the html file to grab the info i want, using loadHTMLFile() I created the following script from the example
test.php
Code: Select all
<?php
$doc = new DOMDocument();
$doc->loadHTML("ex2.html");
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
echo $tag->getAttribute('href').' | '.$tag->nodeValue."\n";
}
?>ex2.html
Code: Select all
<html>
<head>
<title>My Page</title>
</head>
<body>
<p><a href="/mypage1">Hello World!</a></p>
<p><a href="/mypage2">Another Hello World!</a></p>
</body>
</html>the origional example of
Code: Select all
<?php
$myhtml = <<<EOF
<html>
<head>
<title>My Page</title>
</head>
<body>
<p><a href="/mypage1">Hello World!</a></p>
<p><a href="/mypage2">Another Hello World!</a></p>
</body>
</html>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($myhtml);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
echo $tag->getAttribute('href').' | '.$tag->nodeValue."\n";
}
?>