Parse HTML - ereg?
Posted: Tue Jun 17, 2003 6:04 pm
I would like to do the following:
1. Have a form that allows the user to specify a URL to process (this part I can do already)
2. Go get the specified URL, scan for certain tags (in this case, SPAN tags with specific IDs) that are located within a whole pile of other HTML
3. Take the content from inside those tags and output a pipe seperated list.
I know what IDs I am looking for on the target page, so I can specify what they are within the script I would think. Each ID only appears once, so there would be no multiple matches.
An example of the data:
<OTHERHTML>....
<SPAN ID="Country">Guyana</SPAN>
<OTHERHTML>....
<SPAN ID="FlagName">Soaring Cross</SPAN>
<OTHERHTML>....
<SPAN ID="FlagDate">Incorporated in June 1755</SPAN>
<OTHERHTML>....
An example of what I would like to output:
Guyana|Soaring Cross|Incorporated in June 1755
I would imagine I can use fopen to get the URL, and then ereg to match the IDs I am looking for, but I am unsure how to get the information inside the SPAN tags into a string (or strings) so that I can print them to the screen.
Thanks,
acalder
1. Have a form that allows the user to specify a URL to process (this part I can do already)
2. Go get the specified URL, scan for certain tags (in this case, SPAN tags with specific IDs) that are located within a whole pile of other HTML
3. Take the content from inside those tags and output a pipe seperated list.
I know what IDs I am looking for on the target page, so I can specify what they are within the script I would think. Each ID only appears once, so there would be no multiple matches.
An example of the data:
<OTHERHTML>....
<SPAN ID="Country">Guyana</SPAN>
<OTHERHTML>....
<SPAN ID="FlagName">Soaring Cross</SPAN>
<OTHERHTML>....
<SPAN ID="FlagDate">Incorporated in June 1755</SPAN>
<OTHERHTML>....
An example of what I would like to output:
Guyana|Soaring Cross|Incorporated in June 1755
I would imagine I can use fopen to get the URL, and then ereg to match the IDs I am looking for, but I am unsure how to get the information inside the SPAN tags into a string (or strings) so that I can print them to the screen.
Thanks,
acalder