Any questions involving matching text strings to patterns - the pattern is called a "regular expression."
Moderator: General Moderators
xitura
Forum Newbie
Posts: 20 Joined: Fri Sep 07, 2007 11:25 am
Post
by xitura » Fri Sep 21, 2007 3:33 pm
Hello.
I am trying to scan a website which contains a table with the columns name and image. Not every row contains an image.
I am able to scan for the name but I'm stuck with the images. I only get the ones without an image but I would want to get every row, doesn't really matter if the column contains an image or not, I'd like to extract whats after the second <td> anyway.
Thank you.
ReverendDexter
Forum Contributor
Posts: 193 Joined: Tue May 29, 2007 1:26 pm
Location: Chico, CA
Post
by ReverendDexter » Fri Sep 21, 2007 3:58 pm
Is this in PHP or are you using another form of regex?
If it's in PHP you can do something with preg_match along the lines of:
Code: Select all
$regex = '<td>(\w+)<\/td><td>(.*)<\/td>';
preg_match($regex, $haystack, $array_matches)
Double check my syntax, but that should get you in the ballpark...
Hope it helps!
xitura
Forum Newbie
Posts: 20 Joined: Fri Sep 07, 2007 11:25 am
Post
by xitura » Fri Sep 21, 2007 4:26 pm
Thank you, I had tried that before and it didn't work. But then I realized that the </td>s where tabbed in.
Doesn't the . match whitespaces?
Anyway, thanks for pointing me in the right direction.
feyd
Neighborhood Spidermoddy
Posts: 31559 Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA
Post
by feyd » Sat Sep 22, 2007 12:17 am
Have a read through the stickied primers of the forum. They should shed light on a lot of the more basic questions you asked.
GeertDD
Forum Contributor
Posts: 274 Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium
Post
by GeertDD » Sat Sep 22, 2007 3:14 am
xitura wrote: Doesn't the . match whitespaces?
It doesn't match newlines by default. Apply the
s modifier and it does.
Code: Select all
preg_match('#<td>(\w++)</td><td>(.*?)</td>#s', $str, $matches);
xitura
Forum Newbie
Posts: 20 Joined: Fri Sep 07, 2007 11:25 am
Post
by xitura » Sat Sep 22, 2007 8:52 am
Thanks.