Page 1 of 1

Extract image name if it exists

Posted: Fri Sep 21, 2007 3:33 pm
by xitura
Hello.
I am trying to scan a website which contains a table with the columns name and image. Not every row contains an image.
I am able to scan for the name but I'm stuck with the images. I only get the ones without an image but I would want to get every row, doesn't really matter if the column contains an image or not, I'd like to extract whats after the second <td> anyway.

Code: Select all

<td>(\w+)<\/td><td>Image<\/td>
Thank you.

Posted: Fri Sep 21, 2007 3:58 pm
by ReverendDexter
Is this in PHP or are you using another form of regex?

If it's in PHP you can do something with preg_match along the lines of:

Code: Select all

$regex = '<td>(\w+)<\/td><td>(.*)<\/td>';
preg_match($regex, $haystack, $array_matches)
Double check my syntax, but that should get you in the ballpark...

Hope it helps!

Posted: Fri Sep 21, 2007 4:26 pm
by xitura
Thank you, I had tried that before and it didn't work. But then I realized that the </td>s where tabbed in.
Doesn't the . match whitespaces?

Anyway, thanks for pointing me in the right direction.

Posted: Sat Sep 22, 2007 12:17 am
by feyd
Have a read through the stickied primers of the forum. They should shed light on a lot of the more basic questions you asked.

Posted: Sat Sep 22, 2007 3:14 am
by GeertDD
xitura wrote:Doesn't the . match whitespaces?
It doesn't match newlines by default. Apply the s modifier and it does.

Code: Select all

preg_match('#<td>(\w++)</td><td>(.*?)</td>#s', $str, $matches);

Posted: Sat Sep 22, 2007 8:52 am
by xitura
Thanks.