preg_replace if NOT found

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
tvs008
Forum Commoner
Posts: 29
Joined: Wed May 03, 2006 10:46 pm
Location: Seattle

preg_replace if NOT found

Post by tvs008 »

hi.

my goal is to find when a .pdf is linked and automatically generate text "(PDF)" outside the anchor </a> if its not already specified. what is the pattern for this?

Code: Select all

$pattern = '/.pdf">(.*?)<\\/a>..^pdf/i';
(this is my approximation)

How do I say replace only if "pdf" isnt in the range of 0-5 after the "</a>" ?

Here's what I'm replacing with:

Code: Select all

$replacement = '.pdf">$1</a> (PDF)';
The goal again is to indicate if a link is a pdf if it wasn't already indicated.

Link Example would be Link Example (PDF)

Thanks!
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

You need a negative lookahead ("not-followed-by" assertion):

Code: Select all

preg_replace('~\.pdf">.*?</a>(?! \(PDF\))~i', "$0 (PDF)", ....
User avatar
tvs008
Forum Commoner
Posts: 29
Joined: Wed May 03, 2006 10:46 pm
Location: Seattle

Post by tvs008 »

thanks, thats exactly what i was looking for, negative lookahead.

Now I'm just trying to get it in a range after the </a>, say within about 8 chars. I think I'm closing in on that. Im going to try something like i read:

"a(bc){1,5}": one through five copies of "bc."

Unless there is a more appropriate way of doing ranges. I'm thinking like how strpos works, but in a way that will fit within the pattern...
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

Eh? sorry I don't quite get it... You can use arbitrary expression in a lookahead, so for example this

Code: Select all

</a>(?! .{0,3}?\(PDF\))
would skip "</a> (PDF)" or "</a> foo(PDF)" but accept "</a> foobarbaz (PDF)".
User avatar
tvs008
Forum Commoner
Posts: 29
Joined: Wed May 03, 2006 10:46 pm
Location: Seattle

Post by tvs008 »

let me try explaining this part again.

the pattern could be

Code: Select all

'/.pdf">(.*?)<\\/a>(?!.PDF....)/i'
'/.pdf">(.*?)<\\/a>(?!..PDF...)/i'
'/.pdf">(.*?)<\\/a>(?!...PDF..)/i'
'/.pdf">(.*?)<\\/a>(?!....PDF.)/i'
'/.pdf">(.*?)<\\/a>(?!.....PDF)/i'
(not really sure if the . is correct here. seems to be.)

Basically i want to look for the next occurrence of PDF (case insensitive) within the next n range of characters. This is what made me think of strpos.

I don't mean to ask for the code here, I'm just trying to know what my tools are, like how you pointed out negative lookahead.

Thanks
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Lookaheads may have a variable amount of characters.

Code: Select all

'/.pdf">(.*?)<\\/a>(?!.*PDF/i'
You can replace the asterisk quantifier with a range.
User avatar
tvs008
Forum Commoner
Posts: 29
Joined: Wed May 03, 2006 10:46 pm
Location: Seattle

Post by tvs008 »

awesome!

heres what i got; seems to work in initial test; will do more thorough one tomorrow. got to go. thanks!

Code: Select all

'/.pdf">(.*?)<\\/a>(?!.{0,8}PDF)/i'
User avatar
tvs008
Forum Commoner
Posts: 29
Joined: Wed May 03, 2006 10:46 pm
Location: Seattle

Post by tvs008 »

ok, so it works but with a bug. i have questions about a few things in this expression:

Code: Select all

'/.pdf">(.*)<\/a>(?!.{0,15}PDF)/i'
'/.pdf">(.*?)<\\/a>(?! {0,15}PDF)/i'
'/.pdf">(.*?)\<\/a\>(?!.{0,15}PDF)/i'
These variations all appear to return the same result.

First, I thought < and > were metachars that needed escaping, but it doesnt appear that way here?
Second, whats the diff between .* and .*? (greedy/ungreedy?)
Third, whats the difference between <\\/a> and <\/a> ?
The space and . I think are just whitespace vs any char, right?

Most important to me, this pattern doesnt work for the text - .pdf">link name<br /></a> - but it does work for - .pdf">link name</a> Any clues to what I'm missing?

Thanks!
Post Reply