preg_split on spaces, but not within tags

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

User avatar
bokehman
Forum Regular
Posts: 509
Joined: Wed May 11, 2005 2:33 am
Location: Alicante (Spain)

Post by bokehman »

feyd wrote:
bokehman wrote:
sweatje wrote:

Code: Select all

<[^>]+>
The trouble with that is it will find things that are not html tags.
only in malformed text and, in general, malformed tags too.
Granted but something such as the following (which is not good but is valid HTML) will catch it.

Code: Select all

<p> if(1 < 2) </p>
User avatar
sweatje
Forum Contributor
Posts: 277
Joined: Wed Jun 29, 2005 10:04 pm
Location: Iowa, USA

Post by sweatje »

bokehman wrote:
feyd wrote:
bokehman wrote:The trouble with that is it will find things that are not html tags.
only in malformed text and, in general, malformed tags too.
Granted but something such as the following (which is not good but is valid HTML) will catch it.

Code: Select all

<p> if(1 < 2) </p>
That is not valid HTML. It should be:

Code: Select all

<p> if(1 < 2) </p>
User avatar
bokehman
Forum Regular
Posts: 509
Joined: Wed May 11, 2005 2:33 am
Location: Alicante (Spain)

Post by bokehman »

sweatje wrote:That is not valid HTML.
Yes it is... It's completely legal and it validates at http://validator.w3.org/ Also another combination which is completely legal that your regex would have trouble with is this:

Code: Select all

<element attribute=">">
User avatar
sweatje
Forum Contributor
Posts: 277
Joined: Wed Jun 29, 2005 10:04 pm
Location: Iowa, USA

Post by sweatje »

Ok, it is not valid xhtml.
w3 validator wrote: Below is a list of the warning message(s) produced when validating your document.

1. Warning Line 6 column 9: character "<" is the first character of a delimiter but occurred as data.

<p> if(1 < 2) </p>

This message may appear in several cases:
* You tried to include the "<" character in your page: you should escape it as "<"
* You used an unescaped ampersand "&": this may be valid in some contexts, but it is recommended to use "&", which is always safe.
* Another possibility is that you forgot to close quotes in a previous tag.
Anyway, unless Robert K S has more questions related to the topic of this thread we can probably just drop it.
User avatar
bokehman
Forum Regular
Posts: 509
Joined: Wed May 11, 2005 2:33 am
Location: Alicante (Spain)

Post by bokehman »

I agree! It's an interesting debate from the regex point of view but has little to do with the original post.
Post Reply