Page 1 of 1
a link
Posted: Mon Oct 17, 2005 3:45 pm
by shiznatix
i need to find the first occorance of the phrase GET HTML which is going to be a link then i need to get the link that the GET HTML is being linked to. errr im absolutly aweful at regex so some help please.
Posted: Mon Oct 17, 2005 5:20 pm
by Chris Corbyn
Code: Select all
$data = get_content_from_somewhere();
preg_match('@<a\s+[^>]*?\bhref="([^"]+)"[^>]*?>GET HTML</a>@is', $data, $matches);
print_r($matches);
Breakdown:
@@ - Delimiters
<a\s+ -- Find the start of an <a> tag followed by at least one whitespace
[^>]*? -- Allow some other attributes (javascript etc) to come before "href"
\b -- Edge of a word (href)
href=" -- Just plain string (the href part)
([^"]+) -- Any string of characters other than double quotes -- extracted (the link itself)
"[^>]*?> -- The closing quote, any string of characters other than ">" zero or more times (other attributes which may, or may not be there
The rest should be obvious.
The modifiers "is" mean case insensitve and ignore whitespace

Posted: Mon Oct 17, 2005 8:59 pm
by sweatje
d11wtq wrote:
The modifiers "is" mean case insensitve and ignore whitespace

Actually s modifier means that . matches a newline. It is usually used in conjunction with m for multiline regex processing.
Perhaps you were thinking of x which disables whitespace parsing and allows for comments in the regex?
PCRE Pattern Modifiers
Posted: Tue Oct 18, 2005 4:52 am
by shiznatix
ok sigh that worked prefect thanks. i tried using some of that stuff the "breakdown" to match the first link in a textarea box to no avail. i know this is garbage but this is the kinda stuff iv been trying.
Code: Select all
preg_match('@<a\s+[^>]*?\bhref="([^"]+)"[^>]*?>[^>]*?</a>[^>]*?</textarea>@is', $incoming, $matches);
you guessed it, no luck
Posted: Tue Oct 18, 2005 6:20 am
by shiznatix
forget that. i don't need that anymore.