a link

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

a link

Post by shiznatix »

i need to find the first occorance of the phrase GET HTML which is going to be a link then i need to get the link that the GET HTML is being linked to. errr im absolutly aweful at regex so some help please.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Code: Select all

$data = get_content_from_somewhere();

preg_match('@<a\s+[^>]*?\bhref="([^"]+)"[^>]*?>GET HTML</a>@is', $data, $matches);

print_r($matches);
Breakdown:

@@ - Delimiters
<a\s+ -- Find the start of an <a> tag followed by at least one whitespace
[^>]*? -- Allow some other attributes (javascript etc) to come before "href"
\b -- Edge of a word (href)
href=" -- Just plain string (the href part)
([^"]+) -- Any string of characters other than double quotes -- extracted (the link itself)
"[^>]*?> -- The closing quote, any string of characters other than ">" zero or more times (other attributes which may, or may not be there
The rest should be obvious.

The modifiers "is" mean case insensitve and ignore whitespace ;)
User avatar
sweatje
Forum Contributor
Posts: 277
Joined: Wed Jun 29, 2005 10:04 pm
Location: Iowa, USA

Post by sweatje »

d11wtq wrote: The modifiers "is" mean case insensitve and ignore whitespace ;)
Actually s modifier means that . matches a newline. It is usually used in conjunction with m for multiline regex processing.

Perhaps you were thinking of x which disables whitespace parsing and allows for comments in the regex?

PCRE Pattern Modifiers
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

ok sigh that worked prefect thanks. i tried using some of that stuff the "breakdown" to match the first link in a textarea box to no avail. i know this is garbage but this is the kinda stuff iv been trying.

Code: Select all

preg_match('@<a\s+[^>]*?\bhref="([^"]+)"[^>]*?>[^>]*?</a>[^>]*?</textarea>@is', $incoming, $matches);
you guessed it, no luck
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

forget that. i don't need that anymore.
Post Reply