Part of your expression, '[{\/*}|{\s?}]', does not do what you think is does.
Square Brackets:
Anything between square brackets is a
character class which always matches exactly one character, and one character only. Your '[{\/*}|{\s?}]' character class is read by the PCRE regex engine as:
"Match exactly one character that is a '{' or a '/' or a '*' or a '}' or a '|' or a '{' (redundant) or a '\s=whitespace char' or a '?' or a '}' (also redundant)". Also, many characters that are special metacharacters outside of a character class are not special inside the character class and don't need to be escaped. (i.e. [*+?(){}$] and others).
Curly Brackets:
Curly brackets are used as a quantifier used to specify a precise number of times that the preceding token can be repeated. For example: '/X{3,5}/' matches XXX, XXXX and XXXXX but does not match XX.
Here are some examples of invalid tags that your regexes happily match:
Code: Select all
<input type="text" {>
<input type="text" *>
<input type="text" }>
<input type="text" ?>
<input type="text" invalid arbitrary non-attribute stuff here>
<textarea rows="5" cols="10" />data</textarea>
<textarea rows="5" cols="10" {>data</textarea>
<textarea rows="5" cols="10" }>data</textarea>
<textarea rows="5" cols="10" |>data</textarea>
That said, here are two regexes which work more along the lines of what you are trying to do:
Code: Select all
Match a self-closing XHTML tag:
'%<(\w+)(\s+\w+\s*=\s*("[^"]*"|\'[^']*\'))*\s*/>%'
Match a normal non-self-closing XHTML tag:
'%<(\w+)(\s+\w+\s*=\s*("[^"]*"|\'[^']*\'))*\s*>.*?</\1>%'
For a quick refresher course, chack out:
http://www.regular-expressions.info/