Page 1 of 1

Extracting Data from HTML files

Posted: Mon Jan 11, 2010 8:42 pm
by MythX
Hi, I have several thousand html files that I need to extract into a database. I'm fairly new with PHP, but I figure that'll be my best chance of getting at this data. The information I'm trying to extract is the title field, and several fields within a couple tables. Basically, each html files should be a row in the database. Extraction to a csv file would be fine as well.

Any ideas on how to get something like this going? I'm figuring it's probably simple for a pro.

Thanks in advance.

Re: Extracting Data from HTML files

Posted: Mon Jan 11, 2010 8:51 pm
by SidewinderX
Are you saying you want to import html files into your database? I'm a little unclear as to what you are trying to do. However,

You can get the contents of html files either using fopen/fread or file_get_contents. Once you read the html file into a buffer, you can parse the content using preg_match. From there you can insert it into your database.

Re: Extracting Data from HTML files

Posted: Mon Jan 11, 2010 9:02 pm
by MythX
SidewinderX wrote:Are you saying you want to import html files into your database? I'm a little unclear as to what you are trying to do. However,

You can get the contents of html files either using fopen/fread or file_get_contents. Once you read the html file into a buffer, you can parse the content using preg_match. From there you can insert it into your database.
Yes, actually, just a portion of each html file needs to be extracted, the title, and some info in some tables. I'll lookup those functions you mentioned and see if I can make something work. Thanks for your help.