PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!
...with an empty string. Generally, I want to get rid of everything or simply the values between quotes. Hovewer, sometimes there can also be a space before / after the "=" sign - i.e. onload = "something".
Any simple and hopefully fast regex for this?
Thanks a lot!
Tomas
Last edited by tomfra on Fri Aug 27, 2004 1:15 pm, edited 1 time in total.
and it seems to be working. Is there anything wrong with this code? You know, it was kind of...luck that I figured it out
The reason I need this regex is because of a bug / strange behaviour in strip_tags. It works great but in the above example JavaScript code it has problems with this part:
Or actually with the "<" sign in it. For some reason strip_tags stops converting the html content into plain text at that point. When you try to strip tags on this example:
<img src="image.gif" onload="if (this.width<50) {this.src='image2.gif'; this.width='120'; this.height='90'}">
<p>This is some text</p>
It will not output anything. If you get rid of the "<" in the JavaScript code - e.g. change it to (this.width=50), everything will work as expected.
Is there a better fix than getting rid of the JS code completely via preg_replace? If not then I will simply use that but there may be more situations when something like this could happen in my opinion.
I don't know if will will not alter the functionality in some situations though because I know next to nothing about regex. I added backslash because my PsPad couldn't highlight the code properly because it though PHP code was ended with the ? > tag.
One has to wonder why the strip_tags function even exists in PHP when it's buggy and can be replaced with a one-liner. I guess it may be because strip_tags is faster?
strip_tags is there for people who know nothing about the regular expression functions. It is also useful because it is faster (because of compiled code) .. it also will ignore tags, if you send them with it.
I've tried playing with the regex a little but when it comes to regex I am a total newbie so no luck there. I can't think of any "clean" solution. I think I will have to use one regex to get rid of the javascript code and then use the other regex. But that doesn't sound great either, I'd like to keep as little regex code as possible so that it doesn't slow down everything too much.
I'd suggest building a list of valid tags so the matching is a bit more correct in only stripping actual tags. However, that adds a bit to maintainence.
Last edited by feyd on Wed Aug 24, 2005 7:48 am, edited 2 times in total.
Using the above example it will return this string:
testestets>blah
Which is completely correct because the ">" sign is a part of the text, but it gets rid of the ">" sign in the javascript. I am testing everything now and so far haven't found a problem with it.