Page 1 of 1

All table rows in this table

Posted: Fri Jul 27, 2007 4:39 am
by shiznatix
I am getting a XLS report from this affiliate but of course they can't give me something like a CSV, no they give me a HTML page. Yay.

So to get this data out I am looking to use regex to get all the data in that table then I can go through the table rows and get my data. So I need a bit of help writing this expression.

Here is the beginning and end of the table:

Code: Select all

<table cellspacing="0" border="0" id="ctl00_ContentPlaceHolder1_dgStats_ctl01" style="width:100%;border-collapse:collapse;table-layout:fixed;overflow:hidden;empty-cells:show;">
...
</tbody>
I have tried a few patterns but to no avail. Here is what I thought should work but does not:

Code: Select all

preg_match('#<table [.]+>[.]+</tbody>#mis', $info, $matches);
dump($matches);
so if anyone can help me get that information that would be fantastic.

Posted: Fri Jul 27, 2007 6:26 am
by Gente

Code: Select all

preg_match('#<table (.*)>(.*)</tbody>#mis', $info, $matches);
echo $matches[0];

Posted: Fri Jul 27, 2007 9:16 am
by superdezign
Gente wrote:

Code: Select all

preg_match('#<table (.*)>(.*)</tbody>#mis', $info, $matches);
echo $matches[0];
I'd remove the first pattern capture and the space before it (unless you need that data, but it doesn't look useful), and make those patterns a less greedy.

Code: Select all

#<table[^>]*>(.*?)</tbody>#
Also, I believe DOMDocument works on HTML as well. It may be of interest.