Hi, I have several thousand html files that I need to extract into a database. I'm fairly new with PHP, but I figure that'll be my best chance of getting at this data. The information I'm trying to extract is the title field, and several fields within a couple tables. Basically, each html files should be a row in the database. Extraction to a csv file would be fine as well.
Any ideas on how to get something like this going? I'm figuring it's probably simple for a pro.
Thanks in advance.
Extracting Data from HTML files
Moderator: General Moderators
-
SidewinderX
- Forum Contributor
- Posts: 407
- Joined: Fri Jul 16, 2004 9:04 pm
- Location: NY
Re: Extracting Data from HTML files
Are you saying you want to import html files into your database? I'm a little unclear as to what you are trying to do. However,
You can get the contents of html files either using fopen/fread or file_get_contents. Once you read the html file into a buffer, you can parse the content using preg_match. From there you can insert it into your database.
You can get the contents of html files either using fopen/fread or file_get_contents. Once you read the html file into a buffer, you can parse the content using preg_match. From there you can insert it into your database.
Re: Extracting Data from HTML files
Yes, actually, just a portion of each html file needs to be extracted, the title, and some info in some tables. I'll lookup those functions you mentioned and see if I can make something work. Thanks for your help.SidewinderX wrote:Are you saying you want to import html files into your database? I'm a little unclear as to what you are trying to do. However,
You can get the contents of html files either using fopen/fread or file_get_contents. Once you read the html file into a buffer, you can parse the content using preg_match. From there you can insert it into your database.