All table rows in this table

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

All table rows in this table

Post by shiznatix »

I am getting a XLS report from this affiliate but of course they can't give me something like a CSV, no they give me a HTML page. Yay.

So to get this data out I am looking to use regex to get all the data in that table then I can go through the table rows and get my data. So I need a bit of help writing this expression.

Here is the beginning and end of the table:

Code: Select all

<table cellspacing="0" border="0" id="ctl00_ContentPlaceHolder1_dgStats_ctl01" style="width:100%;border-collapse:collapse;table-layout:fixed;overflow:hidden;empty-cells:show;">
...
</tbody>
I have tried a few patterns but to no avail. Here is what I thought should work but does not:

Code: Select all

preg_match('#<table [.]+>[.]+</tbody>#mis', $info, $matches);
dump($matches);
so if anyone can help me get that information that would be fantastic.
User avatar
Gente
Forum Contributor
Posts: 252
Joined: Wed Jun 13, 2007 9:43 am
Location: Ukraine, Kharkov
Contact:

Post by Gente »

Code: Select all

preg_match('#<table (.*)>(.*)</tbody>#mis', $info, $matches);
echo $matches[0];
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Gente wrote:

Code: Select all

preg_match('#<table (.*)>(.*)</tbody>#mis', $info, $matches);
echo $matches[0];
I'd remove the first pattern capture and the space before it (unless you need that data, but it doesn't look useful), and make those patterns a less greedy.

Code: Select all

#<table[^>]*>(.*?)</tbody>#
Also, I believe DOMDocument works on HTML as well. It may be of interest.
Post Reply