Page 1 of 1

How stupid is it to try this? PHP/regex dilemma

Posted: Thu May 28, 2009 7:39 pm
by titaniumdoughnut
I'm not exactly a regex wizard. I can, with considerable time spent googling and reading regex guides online, construct a moderately complex regex. That's the extent of my knowledge.

So, say we have a filename contained in $filename (it might be something like "pizza.jpg") and we want to search an html document, and find every <img> tag which references that filename inside of single or double quotes. Essentially, we're looking for things like <img src="pizza.jpg">.

Not that difficult. I've worked out how to do this much. I even worked out how to make sure that directories listed before the filename are ignored, so that "pizza.jpg" and "hello/pizza.jpg" will be found.

Here's the evil part. We've found all these <img> tags which contain the file name we're looking for.

Say we need to find alt attributes, if they exist in those <img> tags, ANYWHERE in those <img> tags, and replace the contents of the alt attribute. ONLY for <img> tags which contain the filename in $filename. And if there is no alt attribute, we need to add one. The alt attributes might use single quotes. They might use double quotes. They might have spaces on either side of the = sign. They might be the first attribute in the tag, or the last, or anywhere in between.

To sum up: I need to find and replace the contents of the alt attributes in img tags, modifying only those image tags which contain my search filename. And generate new alt attributes if they're missing.

Is this something I'll be able to work out? Is there a kind soul here who can get me started without spending hours on it?

Thanks so much!

Re: How stupid is it to try this? PHP/regex dilemma

Posted: Thu May 28, 2009 10:00 pm
by Zoxive
I would suggest not using regex, and use something like DOMDocument.

http://us3.php.net/manual/en/class.domdocument.php

Re: How stupid is it to try this? PHP/regex dilemma

Posted: Fri May 29, 2009 1:32 am
by prometheuzz
titaniumdoughnut wrote:Is this something I'll be able to work out? Is there a kind soul here who can get me started without spending hours on it?
It depends on how much regex you know.
Also, it would greatly help if you'd give some (more than one) before and after examples.
But, as the previous poster rightfully mentioned before me, tinkering with (x)html through regex is asking for trouble. A true html parser is almost always a better/safer solution.

Good luck.