Page 1 of 1
preg_replace if NOT found
Posted: Mon Jul 23, 2007 4:12 pm
by tvs008
hi.
my goal is to find when a .pdf is linked and automatically generate text "(PDF)" outside the anchor </a> if its not already specified. what is the pattern for this?
Code: Select all
$pattern = '/.pdf">(.*?)<\\/a>..^pdf/i';
(this is my approximation)
How do I say replace only if "pdf" isnt in the range of 0-5 after the "</a>" ?
Here's what I'm replacing with:
Code: Select all
$replacement = '.pdf">$1</a> (PDF)';
The goal again is to indicate if a link is a pdf if it wasn't already indicated.
Link Example would be
Link Example (PDF)
Thanks!
Posted: Mon Jul 23, 2007 4:26 pm
by stereofrog
You need a negative lookahead ("not-followed-by" assertion):
Code: Select all
preg_replace('~\.pdf">.*?</a>(?! \(PDF\))~i', "$0 (PDF)", ....
Posted: Mon Jul 23, 2007 6:05 pm
by tvs008
thanks, thats exactly what i was looking for, negative lookahead.
Now I'm just trying to get it in a range after the </a>, say within about 8 chars. I think I'm closing in on that. Im going to try something like i read:
"a(bc){1,5}": one through five copies of "bc."
Unless there is a more appropriate way of doing ranges. I'm thinking like how strpos works, but in a way that will fit within the pattern...
Posted: Mon Jul 23, 2007 6:31 pm
by stereofrog
Eh? sorry I don't quite get it... You can use arbitrary expression in a lookahead, so for example this
would skip "</a> (PDF)" or "</a> foo(PDF)" but accept "</a> foobarbaz (PDF)".
Posted: Mon Jul 23, 2007 6:51 pm
by tvs008
let me try explaining this part again.
the pattern could be
Code: Select all
'/.pdf">(.*?)<\\/a>(?!.PDF....)/i'
'/.pdf">(.*?)<\\/a>(?!..PDF...)/i'
'/.pdf">(.*?)<\\/a>(?!...PDF..)/i'
'/.pdf">(.*?)<\\/a>(?!....PDF.)/i'
'/.pdf">(.*?)<\\/a>(?!.....PDF)/i'
(not really sure if the . is correct here. seems to be.)
Basically i want to look for the next occurrence of PDF (case insensitive) within the next
n range of characters. This is what made me think of strpos.
I don't mean to ask for the code here, I'm just trying to know what my tools are, like how you pointed out negative lookahead.
Thanks
Posted: Mon Jul 23, 2007 6:55 pm
by superdezign
Lookaheads may have a variable amount of characters.
You can replace the asterisk quantifier with a range.
Posted: Mon Jul 23, 2007 7:05 pm
by tvs008
awesome!
heres what i got; seems to work in initial test; will do more thorough one tomorrow. got to go. thanks!
Code: Select all
'/.pdf">(.*?)<\\/a>(?!.{0,8}PDF)/i'
Posted: Tue Jul 24, 2007 1:19 pm
by tvs008
ok, so it works but with a bug. i have questions about a few things in this expression:
Code: Select all
'/.pdf">(.*)<\/a>(?!.{0,15}PDF)/i'
'/.pdf">(.*?)<\\/a>(?! {0,15}PDF)/i'
'/.pdf">(.*?)\<\/a\>(?!.{0,15}PDF)/i'
These variations all appear to return the same result.
First, I thought < and > were metachars that needed escaping, but it doesnt appear that way here?
Second, whats the diff between .* and .*? (greedy/ungreedy?)
Third, whats the difference between <\\/a> and <\/a> ?
The space and . I think are just whitespace vs any char, right?
Most important to me, this pattern doesnt work for the text - .pdf">link name<br /></a> - but it does work for - .pdf">link name</a> Any clues to what I'm missing?
Thanks!