Parsing HTML Page to database

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
mckooter
Forum Commoner
Posts: 26
Joined: Fri Jul 28, 2006 10:02 pm

Parsing HTML Page to database

Post by mckooter »

Okay this is the beginning of a project im working on, the goal is to take all the data stored in the following html page:
http://www.cryosphere.f2s.com/Freelancer/example.html
(thats just a demo my actual page has nearly 900 entries)

and put all that data into a database, im just at the beginning and already having trouble, and cannot figure out what

first im trying to parse the html file to grab the info i want, using loadHTMLFile() I created the following script from the example

test.php

Code: Select all

<?php
$doc = new DOMDocument();
$doc->loadHTML("ex2.html");

$tags = $doc->getElementsByTagName('a');

foreach ($tags as $tag) {
       echo $tag->getAttribute('href').' | '.$tag->nodeValue."\n";
}
?>

ex2.html

Code: Select all

<html>
<head>
<title>My Page</title>
</head>
<body>
<p><a href="/mypage1">Hello World!</a></p>
<p><a href="/mypage2">Another Hello World!</a></p>
</body>
</html>


the origional example of

Code: Select all

<?php
$myhtml = <<<EOF
<html>
<head>
<title>My Page</title>
</head>
<body>
<p><a href="/mypage1">Hello World!</a></p>
<p><a href="/mypage2">Another Hello World!</a></p>
</body>
</html>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($myhtml);

$tags = $doc->getElementsByTagName('a');

foreach ($tags as $tag) {
       echo $tag->getAttribute('href').' | '.$tag->nodeValue."\n";
}
?>
works fantastic, this is simple i know it, but it wont wory any way i try it all i get with my example is a blank page, but its the same information.... im sooo confused, apparently i cant do half of what i thought i could
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

mckooter wrote:$doc->loadHTML($myhtml);
here you pass the html contents to the method loadHTML(). And here
mckooter wrote:$doc->loadHTML("ex2.html");
it's the name of a file. How would php know the difference?
You need another method described at http://de3.php.net/dom
mckooter
Forum Commoner
Posts: 26
Joined: Fri Jul 28, 2006 10:02 pm

Post by mckooter »

ahhh

loadHTMLFile is what i wanted, i swear ill get it, spent all day looking to be able to find a way to do this, not sure if im going about it the most effective but i think i can get it to work
mckooter
Forum Commoner
Posts: 26
Joined: Fri Jul 28, 2006 10:02 pm

Post by mckooter »

sorry to double post but i dont think i need to start a new topic, i just want to see if there is a much easier way of doing what im doing, or rather what seems to be the only way i can do something

the data im trying to import has a date format of

Code: Select all

21:52:13 - 29 Dec 06
as an example, im trying to store this to database, so i found a class to convert the data to the type that i will be needing

the portion i have done so far is:

Code: Select all

<?php

$date = "21:52:13 - 29 Dec 06";

$newdate = ereg_replace("[-]","",$date);
echo $newdate;

?>
removing the -, simple enough, now i want to convert the DEC to 12, but i feel that 12 consecutive ereg_replace would be ridiculous and laughable, so before i make a mess of code I figured i would check to see if there is a way to convert it easier

also, please dont laugh if my above way is wayyyy too long of a path to reach a simple goal, all the documentation i have read regarding date has referred to converting date recieved from database/php to readable formats, i am doing the opposite
nickvd
DevNet Resident
Posts: 1027
Joined: Thu Mar 10, 2005 5:27 pm
Location: Southern Ontario
Contact:

Post by nickvd »

wouldn't using regex be quicker and easier?
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

Code: Select all

echo date('m/d/Y, H:i:s',strtotime(preg_replace('/(.*) - (.*)/',"$2, $1",$date)));
mckooter
Forum Commoner
Posts: 26
Joined: Fri Jul 28, 2006 10:02 pm

Post by mckooter »

thanks to both, i will look at them both, regex seems to be the much easier way to accomplish a simple task, i am still learning, so far i can find the longest way to a easy goal, but atleast i can find that goal, the community here and elsewhere help to show me the much quicker path to the goal i want, ang i thank you for that


PS: you should see some of the scripts ive written, coding wise they are terrifying, just horribly scripted, but as a newcomer they did what i wanted
nickvd
DevNet Resident
Posts: 1027
Joined: Thu Mar 10, 2005 5:27 pm
Location: Southern Ontario
Contact:

Post by nickvd »

I was actually referring to the scraping of the html page, but using it for the timestamp works fine too :)
Post Reply