Parsing HTML file to Database

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
locorecto
Forum Newbie
Posts: 4
Joined: Sat Oct 10, 2009 5:06 pm

Parsing HTML file to Database

Post by locorecto »

Hello Guys, thanks for reading my question. I have as set of html files with some information which I need to insert into a MySQL database. The data consist of Dates and Historical facts which happened on those dates.

Code: Select all

<p>
    <strong>June 26, 1721</strong> Following the recommendation of
    Rev. Cotton Mather, Dr. Zabdiel Boylston of Boston completes the first
    inoculation against smallpox in the U.S., injecting his own son and two of his
    slaves. 
</p>
<p>
    <strong>1736</strong> In New York, the city almshouse, located on Broadway
    near Park Row, opens an infirmary with six beds. This infirmary grows into
    Bellevue Hospital.
</p>
<p>
    <strong>May 11, 1751</strong> Benjamin Franklin and Dr. Thomas
    Bond receive a charter from the Pennsylvania legislature to open the first
    hospital in the American colonies for the sick poor and the insane. 
</p>
<p>
    <strong>1770</strong> Kings College awards the first M.D. degree in the
    colonies to Robert Tucker.
</p>
<p>
    <strong>June 13, 1771</strong> New York Hospital, the second in
    the colonies after the Pennsylvania Hospital, receives a royal charter from
    King 
</p>
<p>
    George III under the name Society of the Hospital in the City of
    New York in America, later changed to Society of New York Hospital. 
</p>
<p>
    <strong>Oct. 12, 1773</strong> The Public Hospital for Persons of
    Insane and Disordered Minds is established in Williamsburg, Virginia. It was
    the first building in North America devoted solely to the treatment of the
    mentally ill. 
</p>
<p>
    <strong>1791</strong> The Society of New York Hospital opens at a site on Broad­way
    between Duane and Worth Streets.
</p>
I have used the following php script to insert the corresponding data into the DB.

Code: Select all

$dom_doc = new DOMDocument();
	$html_file = file_get_contents('HIA.htm');

	$dom_doc->loadHTML( $html_file );

	$tags_p = $dom_doc->getElementsByTagName('p');	
	
	foreach($tags_p as $key=>$tag) {
	    $tag_value = $tag->nodeValue;
		$date = $tag->getAttribute('strong');
                $query = "INSERT INTO Milestones(Date, Text) VALUES('$tag_value', '$date')";
               mysql_query($query)or die('Value '.$Value.' and Date'.$date. 'could not be inserted. '.myslq_error() );
               }
echo "Done";
Here is the problem. When I go into the DB the Date fields are empty. Also the Text fields include the date at the beginning of the text as follow
June 26, 1721 Following the recommendation of Rev. Cotton Mather, Dr. Zabdiel Boylston of Boston completes the first inoculation against smallpox in the U.S., injecting his own son and two of his slaves.
I would like to have the date formatted into the Date field as m/d/Y, and have the Text field with only the text of the historical fact and not the date.

I appreciate your help in advance.
2-d
Forum Newbie
Posts: 6
Joined: Fri Sep 02, 2011 9:50 pm

Re: Parsing HTML file to Database

Post by 2-d »

Hey there,

The reason the date is empty is because you are trying to get an attribute. Attributes in HTML will be something like this:

Code: Select all

<div id="asd"></div>
Where the 'id' element is the attribute. So in your code, there are no attributes so its returning nothing. What you want to do in this case, is again use getElementsByTagName('strong'). Then just grab the first element it finds:

Code: Select all

$dateNode = $tag->getElementsByTagName('strong');
$date = $dateNode->item(0)->nodeValue;
Now that you have the date, you can do a simple remove on your $tag_value variable to get rid of the date in the text:

Code: Select all

$tag_value = $tag->nodeValue;
if(strpos($tag_value, $date))
	$tag_value = trim(str_replace($date, "", $tag_value));
Then when thats done, you should be good to go for the database insert.

Best of luck
locorecto
Forum Newbie
Posts: 4
Joined: Sat Oct 10, 2009 5:06 pm

Re: Parsing HTML file to Database

Post by locorecto »

Thanks a lot for your help. I am going to try your suggestion and keep you posted. Thanks again.
locorecto
Forum Newbie
Posts: 4
Joined: Sat Oct 10, 2009 5:06 pm

Re: Parsing HTML file to Database

Post by locorecto »

That worked great. Thanks for the tip.
Post Reply