Page 1 of 1

extracting data from html table

Posted: Wed Jan 04, 2017 11:28 am
by Willy70EB
Dear all,
I would like to submit a question to which unfortunately I can not find solution. I will briefly explain my problem.

I would like to populate the database with the data that are present within a table in an html file and if possible
repeat this for each html file, I have about 2000 files to process.

I did extensive research on the internet and found some solutions based on Regex and others through a
extension DOM Parser but neither worked properly.

Unfortunately my situation is a little complex because the html file that contains the table has other
Information that I do not need, or other html tag I have to eliminate and then, unfortunately,
the table structure isn't always the same for all files. Basically I have at least 7-8 kinds of tables
and none of them has header tags <TH>. A sample structure is this:

<Table>
<Tr >
<Td >
TABLE 1 </ td>
</ Tr>
<Tr >
<Td> Column1 </ td>
<Td> Column2 </ td>
<Td> Column3 </ td>
<Td> COLONNA4 </ td>
<Td> COLONNA5 </ td>
<Td> COLONNA6 </ td>
<Td> COLONNA7 </ td>
</ Tr>
<Tr >
<Td >
1 </ td>
<Td> USER 1 </ td>
<Td> M </ td>
<Td> ROME </ td>
<Td> RM </ td>
<Td> 11111111 </ td>
<Td> 22222222 </ td>

</ Tr>
........
</ Table>

That 's just an example because in some files columns are not 7 but a different number with
different names.

Do you think I have a chance with PHP or other tools which may include the ability to extract data
and place them in a SQL table?

My little project is obviously not for commercial purposes, it is non-profit and only for study.

Thank you all for your attention.
Greetings
Willy

Re: extracting data from html table

Posted: Wed Jan 04, 2017 1:44 pm
by Celauran
So the first row can be mapped to the column names and the remaining rows contain the data? Could you not use an XPath query to first build up column names and then create an array of data which you could insert into your database? What have you tried and where did it fall short?

Re: extracting data from html table

Posted: Thu Jan 05, 2017 7:12 am
by Willy70EB
Celauran wrote:So the first row can be mapped to the column names and the remaining rows contain the data?
yes
Celauran wrote: Could you not use an XPath query to first build up column names and then create an array of data which you could insert into your database? What have you tried and where did it fall short?
I tried with regex and DOM using some little php script downloaded from web. If you have more time can you post a little example ?
Thanks
Willy