Page 1 of 1

preg_replace if NOT found

Posted: Mon Jul 23, 2007 4:12 pm
by tvs008
hi.

my goal is to find when a .pdf is linked and automatically generate text "(PDF)" outside the anchor </a> if its not already specified. what is the pattern for this?

Code: Select all

$pattern = '/.pdf">(.*?)<\\/a>..^pdf/i';
(this is my approximation)

How do I say replace only if "pdf" isnt in the range of 0-5 after the "</a>" ?

Here's what I'm replacing with:

Code: Select all

$replacement = '.pdf">$1</a> (PDF)';
The goal again is to indicate if a link is a pdf if it wasn't already indicated.

Link Example would be Link Example (PDF)

Thanks!

Posted: Mon Jul 23, 2007 4:26 pm
by stereofrog
You need a negative lookahead ("not-followed-by" assertion):

Code: Select all

preg_replace('~\.pdf">.*?</a>(?! \(PDF\))~i', "$0 (PDF)", ....

Posted: Mon Jul 23, 2007 6:05 pm
by tvs008
thanks, thats exactly what i was looking for, negative lookahead.

Now I'm just trying to get it in a range after the </a>, say within about 8 chars. I think I'm closing in on that. Im going to try something like i read:

"a(bc){1,5}": one through five copies of "bc."

Unless there is a more appropriate way of doing ranges. I'm thinking like how strpos works, but in a way that will fit within the pattern...

Posted: Mon Jul 23, 2007 6:31 pm
by stereofrog
Eh? sorry I don't quite get it... You can use arbitrary expression in a lookahead, so for example this

Code: Select all

</a>(?! .{0,3}?\(PDF\))
would skip "</a> (PDF)" or "</a> foo(PDF)" but accept "</a> foobarbaz (PDF)".

Posted: Mon Jul 23, 2007 6:51 pm
by tvs008
let me try explaining this part again.

the pattern could be

Code: Select all

'/.pdf">(.*?)<\\/a>(?!.PDF....)/i'
'/.pdf">(.*?)<\\/a>(?!..PDF...)/i'
'/.pdf">(.*?)<\\/a>(?!...PDF..)/i'
'/.pdf">(.*?)<\\/a>(?!....PDF.)/i'
'/.pdf">(.*?)<\\/a>(?!.....PDF)/i'
(not really sure if the . is correct here. seems to be.)

Basically i want to look for the next occurrence of PDF (case insensitive) within the next n range of characters. This is what made me think of strpos.

I don't mean to ask for the code here, I'm just trying to know what my tools are, like how you pointed out negative lookahead.

Thanks

Posted: Mon Jul 23, 2007 6:55 pm
by superdezign
Lookaheads may have a variable amount of characters.

Code: Select all

'/.pdf">(.*?)<\\/a>(?!.*PDF/i'
You can replace the asterisk quantifier with a range.

Posted: Mon Jul 23, 2007 7:05 pm
by tvs008
awesome!

heres what i got; seems to work in initial test; will do more thorough one tomorrow. got to go. thanks!

Code: Select all

'/.pdf">(.*?)<\\/a>(?!.{0,8}PDF)/i'

Posted: Tue Jul 24, 2007 1:19 pm
by tvs008
ok, so it works but with a bug. i have questions about a few things in this expression:

Code: Select all

'/.pdf">(.*)<\/a>(?!.{0,15}PDF)/i'
'/.pdf">(.*?)<\\/a>(?! {0,15}PDF)/i'
'/.pdf">(.*?)\<\/a\>(?!.{0,15}PDF)/i'
These variations all appear to return the same result.

First, I thought < and > were metachars that needed escaping, but it doesnt appear that way here?
Second, whats the diff between .* and .*? (greedy/ungreedy?)
Third, whats the difference between <\\/a> and <\/a> ?
The space and . I think are just whitespace vs any char, right?

Most important to me, this pattern doesnt work for the text - .pdf">link name<br /></a> - but it does work for - .pdf">link name</a> Any clues to what I'm missing?

Thanks!