<td>1</td><td height="20"><a class="BN" href="index.php?b_id=1990"> test </a></td><td> Test1 </td><td> test2 </td><td> test3 </td><td> test4</td><td> test5</td>
How can I extract all the data from this td's?
Regex to get records from a table
Moderator: General Moderators
-
klevis miho
- Forum Contributor
- Posts: 413
- Joined: Wed Oct 29, 2008 2:59 pm
- Location: Albania
- Contact:
- DigitalMind
- Forum Contributor
- Posts: 152
- Joined: Mon Sep 27, 2010 2:27 am
- Location: Ukraine, Kharkov
Re: Regex to get records from a table
<td.*?>(.*?)</td>
-
klevis miho
- Forum Contributor
- Posts: 413
- Joined: Wed Oct 29, 2008 2:59 pm
- Location: Albania
- Contact:
Re: Regex to get records from a table
Thanks man
- ridgerunner
- Forum Contributor
- Posts: 214
- Joined: Sun Jul 05, 2009 10:39 pm
- Location: SLC, UT
Re: Regex to get records from a table
That will work if your tables are not nested. However, if the tables are nested, you'll need something a bit more complex. You can design a regex to match either the innermost or outermost <td>...</td>. this subject was recently discussed with regard to tables as a whole - See: preg_replace produces mysteriously blank file
That said, if you are dealing with tables that are nested, here is a script containing two commented regexes; one to match innermost TD tags, and another to match outermost TD tags:
These are a bit more complex, as they implement the: "unrolling-the-loop" efficiency technique described in Jeffrey Friedl's classic work: "Mastering Regular Expressions - 3rd Edition".
Hope this helps.
That said, if you are dealing with tables that are nested, here is a script containing two commented regexes; one to match innermost TD tags, and another to match outermost TD tags:
Code: Select all
<?php // File: NestedTds.php
$data = file_get_contents('NestedTablesTestData.html');
// regex to match innermost TDs which do NOT contain nested TDs
$pattern_innermost = '%
# Use: "unroll-the-loop" technique. i.e. "(normal* (special normal*)*)"
# from: "Mastering Regular Expressions - 3rd Edition" by Jeffrey Friedl
<td\b[^>]*+> # Match opening TD tag having any attributes.
[^<]*+ # 1st (normal*) = match up to next < opening tag char.
(?: # Special "<" found. Begin (special normal*)* loop.
(?! </?td\b ) # Begin (special). If < is not start of a TD tag,
< # then safe to match the non-TD-tag <. End (special).
[^<]*+ # 2nd (normal*) = match up to next < opening tag char.
)*+ # End of (special normal*)* loop.
</td> # Match closing TD tag.
%ix';
if (preg_match_all($pattern_innermost, $data, $matches) > 0) {
echo("Inner pattern matched. Here are the results:\r\n");
print_r($matches);
}
// regex to match outermost TDs which may contain nested TDs
$pattern_outermost = '%
<td\b[^>]*+> # Match opening TD tag.
(?: # Non-capture group for alternation.
(?R) # Match a whole nested TD element,
| # or... match a bunch of non-TD-tag characters
[^<]*+ # 1st (normal*) = match up to next < opening tag char.
(?: # Special "<" found. Begin (special normal*)* loop.
(?! </?td\b ) # Begin (special). If < is not start of a TD tag,
< # then safe to match the non-TD-tag <. End (special).
[^<]*+ # 2nd (normal*) = match up to next < opening tag char.
)*+ # End of (special normal*)* loop.
)*+ # loop as many as it takes until outer
</td> # balanced closing TD tag is matched.
%six';
if (preg_match_all($pattern_outermost, $data, $matches) > 0) {
print_r($matches);
}
?>Hope this helps.