Extracting Data from HTML files

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
MythX
Forum Commoner
Posts: 28
Joined: Mon Jan 11, 2010 8:28 pm

Extracting Data from HTML files

Post by MythX »

Hi, I have several thousand html files that I need to extract into a database. I'm fairly new with PHP, but I figure that'll be my best chance of getting at this data. The information I'm trying to extract is the title field, and several fields within a couple tables. Basically, each html files should be a row in the database. Extraction to a csv file would be fine as well.

Any ideas on how to get something like this going? I'm figuring it's probably simple for a pro.

Thanks in advance.
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Re: Extracting Data from HTML files

Post by SidewinderX »

Are you saying you want to import html files into your database? I'm a little unclear as to what you are trying to do. However,

You can get the contents of html files either using fopen/fread or file_get_contents. Once you read the html file into a buffer, you can parse the content using preg_match. From there you can insert it into your database.
MythX
Forum Commoner
Posts: 28
Joined: Mon Jan 11, 2010 8:28 pm

Re: Extracting Data from HTML files

Post by MythX »

SidewinderX wrote:Are you saying you want to import html files into your database? I'm a little unclear as to what you are trying to do. However,

You can get the contents of html files either using fopen/fread or file_get_contents. Once you read the html file into a buffer, you can parse the content using preg_match. From there you can insert it into your database.
Yes, actually, just a portion of each html file needs to be extracted, the title, and some info in some tables. I'll lookup those functions you mentioned and see if I can make something work. Thanks for your help.
Post Reply