How stupid is it to try this? PHP/regex dilemma
Posted: Thu May 28, 2009 7:39 pm
I'm not exactly a regex wizard. I can, with considerable time spent googling and reading regex guides online, construct a moderately complex regex. That's the extent of my knowledge.
So, say we have a filename contained in $filename (it might be something like "pizza.jpg") and we want to search an html document, and find every <img> tag which references that filename inside of single or double quotes. Essentially, we're looking for things like <img src="pizza.jpg">.
Not that difficult. I've worked out how to do this much. I even worked out how to make sure that directories listed before the filename are ignored, so that "pizza.jpg" and "hello/pizza.jpg" will be found.
Here's the evil part. We've found all these <img> tags which contain the file name we're looking for.
Say we need to find alt attributes, if they exist in those <img> tags, ANYWHERE in those <img> tags, and replace the contents of the alt attribute. ONLY for <img> tags which contain the filename in $filename. And if there is no alt attribute, we need to add one. The alt attributes might use single quotes. They might use double quotes. They might have spaces on either side of the = sign. They might be the first attribute in the tag, or the last, or anywhere in between.
To sum up: I need to find and replace the contents of the alt attributes in img tags, modifying only those image tags which contain my search filename. And generate new alt attributes if they're missing.
Is this something I'll be able to work out? Is there a kind soul here who can get me started without spending hours on it?
Thanks so much!
So, say we have a filename contained in $filename (it might be something like "pizza.jpg") and we want to search an html document, and find every <img> tag which references that filename inside of single or double quotes. Essentially, we're looking for things like <img src="pizza.jpg">.
Not that difficult. I've worked out how to do this much. I even worked out how to make sure that directories listed before the filename are ignored, so that "pizza.jpg" and "hello/pizza.jpg" will be found.
Here's the evil part. We've found all these <img> tags which contain the file name we're looking for.
Say we need to find alt attributes, if they exist in those <img> tags, ANYWHERE in those <img> tags, and replace the contents of the alt attribute. ONLY for <img> tags which contain the filename in $filename. And if there is no alt attribute, we need to add one. The alt attributes might use single quotes. They might use double quotes. They might have spaces on either side of the = sign. They might be the first attribute in the tag, or the last, or anywhere in between.
To sum up: I need to find and replace the contents of the alt attributes in img tags, modifying only those image tags which contain my search filename. And generate new alt attributes if they're missing.
Is this something I'll be able to work out? Is there a kind soul here who can get me started without spending hours on it?
Thanks so much!