Page 1 of 1

I want to get <a> tags, how?

Posted: Thu May 28, 2009 9:33 pm
by wren9
Good morning guys,

I want to get all <a> tags on a html page.

and including all characters between <a> & </a>

Any codes demo in REGEX or DOM please?

Sorry this is my first time to scrape a page.

Thank you VERY VERY VERY much your codes will be very very helpful to me, since this is my first and currently learning to scrape a html page.

GOD BLESS.

Re: I want to get <a> tags, how?

Posted: Thu May 28, 2009 10:20 pm
by requinix
Could use some DOM class (eg, DOMDocument) then do a search by tag name.

Then there's always regular expressions. Might be better, hard to say.

Code: Select all

preg_match_all('#<a\s.*?</a>#is', $text, $matches);
print_r($matches[0]);

Re: I want to get <a> tags, how?

Posted: Fri May 29, 2009 4:34 am
by prometheuzz
tasairis wrote:Could use some DOM class (eg, DOMDocument) then do a search by tag name.

Then there's always regular expressions. Might be better, hard to say.

Code: Select all

preg_match_all('#<a\s.*?</a>#is', $text, $matches);
print_r($matches[0]);
Parsing (x)html should be done with an html parser. When running into improperly formed html, the regex might cause the entire html file to be incorrectly parsed while a true html parser can (an probably will) recover from those mistakes.