extracting data from html table
Posted: Wed Jan 04, 2017 11:28 am
Dear all,
I would like to submit a question to which unfortunately I can not find solution. I will briefly explain my problem.
I would like to populate the database with the data that are present within a table in an html file and if possible
repeat this for each html file, I have about 2000 files to process.
I did extensive research on the internet and found some solutions based on Regex and others through a
extension DOM Parser but neither worked properly.
Unfortunately my situation is a little complex because the html file that contains the table has other
Information that I do not need, or other html tag I have to eliminate and then, unfortunately,
the table structure isn't always the same for all files. Basically I have at least 7-8 kinds of tables
and none of them has header tags <TH>. A sample structure is this:
<Table>
<Tr >
<Td >
TABLE 1 </ td>
</ Tr>
<Tr >
<Td> Column1 </ td>
<Td> Column2 </ td>
<Td> Column3 </ td>
<Td> COLONNA4 </ td>
<Td> COLONNA5 </ td>
<Td> COLONNA6 </ td>
<Td> COLONNA7 </ td>
</ Tr>
<Tr >
<Td >
1 </ td>
<Td> USER 1 </ td>
<Td> M </ td>
<Td> ROME </ td>
<Td> RM </ td>
<Td> 11111111 </ td>
<Td> 22222222 </ td>
</ Tr>
........
</ Table>
That 's just an example because in some files columns are not 7 but a different number with
different names.
Do you think I have a chance with PHP or other tools which may include the ability to extract data
and place them in a SQL table?
My little project is obviously not for commercial purposes, it is non-profit and only for study.
Thank you all for your attention.
Greetings
Willy
I would like to submit a question to which unfortunately I can not find solution. I will briefly explain my problem.
I would like to populate the database with the data that are present within a table in an html file and if possible
repeat this for each html file, I have about 2000 files to process.
I did extensive research on the internet and found some solutions based on Regex and others through a
extension DOM Parser but neither worked properly.
Unfortunately my situation is a little complex because the html file that contains the table has other
Information that I do not need, or other html tag I have to eliminate and then, unfortunately,
the table structure isn't always the same for all files. Basically I have at least 7-8 kinds of tables
and none of them has header tags <TH>. A sample structure is this:
<Table>
<Tr >
<Td >
TABLE 1 </ td>
</ Tr>
<Tr >
<Td> Column1 </ td>
<Td> Column2 </ td>
<Td> Column3 </ td>
<Td> COLONNA4 </ td>
<Td> COLONNA5 </ td>
<Td> COLONNA6 </ td>
<Td> COLONNA7 </ td>
</ Tr>
<Tr >
<Td >
1 </ td>
<Td> USER 1 </ td>
<Td> M </ td>
<Td> ROME </ td>
<Td> RM </ td>
<Td> 11111111 </ td>
<Td> 22222222 </ td>
</ Tr>
........
</ Table>
That 's just an example because in some files columns are not 7 but a different number with
different names.
Do you think I have a chance with PHP or other tools which may include the ability to extract data
and place them in a SQL table?
My little project is obviously not for commercial purposes, it is non-profit and only for study.
Thank you all for your attention.
Greetings
Willy