I realize that Regex can't work on nested tags. But I needed to strip the HTML and all but one space, if any space(s) were present, from only the front of a string (say the sort retrieved by innerHTML). So there a complete string of HTML tags which are followed by the substring where I want to start, which substring I'll call the, text.
I tried the pattern - /<*[^>]*>/g,'' - but it removed all tags, even after the text began. I removed character entities the same way, with - /&*[^;]*;/g,'' - and it also removed them everywhere.
I ultimately had to resort to a function that I used, elsewhere, to essentially find the start of the text. But I wondered if a regex could still be used? It's the end of the text that matters if one is worried about nesting. Some function would probably have to be called that would quickly parse the hierarchy as a simple stack. But to find the start of the text, any HTML nesting wouldn't matter. How could I get the pattern - /<*[^>]*>|&*[^;]*;/g,'' - to stop at the first alphabet or digit it encounters outside of a tag or entity.
Strip leading HTML
Moderator: General Moderators
- superdezign
- DevNet Master
- Posts: 4135
- Joined: Sat Jan 20, 2007 11:06 pm
If you want use to give you a regex example, you should give us and example of what you are given and what you want to get out of it.
You may also want to use strip_tags().
You may also want to use strip_tags().
I needed to strip the HTML and all but one space, if any space(s) were present, from only the front of a string (say the sort retrieved by innerHTML). So there a complete string of HTML tags which are followed by the substring where I want to start, which substring I'll call the, text.superdezign wrote:If you want use to give you a regex example, you should give us and example of what you are given and what you want to get out of it.
But if you want some specific example:
" <span><br> <br>He honed his players into <i>Hall of Famers</i>, <i>MVPs</i>, Pro Bowlers, household names and winners.<br></span>"
Into:
" He honed his players into <i>Hall of Famers</i>, <i>MVPs</i>, Pro Bowlers, household names and winners.<br></span>"
As far as I know, this regex strips every tag, every comment and every character entity:superdezign wrote: use strip_tags().
/<*[^>]*>|&*[^;]*;/g,''
If followed by a / {2,}/g to strip consecutive spaces, I want it to stop at the 'H' in, He, above, or whatever non-whitespace character might be there that isn't a "<" or "&".
- superdezign
- DevNet Master
- Posts: 4135
- Joined: Sat Jan 20, 2007 11:06 pm
... That's.... odd.
Takes HTML tags, character entities, and anything that isn't a letter a strips it away. As for keeping the space... You're on your own.
Code: Select all
@(<[^>]+>|&[^;]+;|[^a-z])*@iThanks. This worked:superdezign wrote:... That's.... odd.
Takes HTML tags, character entities, and anything that isn't a letter a strips it away. As for keeping the space... You're on your own.Code: Select all
@(<[^>]+>|&[^;]+;|[^a-z])*@i
Code: Select all
^(<[^>]+>|&[^;]+;|[^a-z])*/gi,''Code: Select all
<[^>]+>|&[^;]+;/g,''Code: Select all
{2,}/g,' 'Then you'd need a third operation, a conditional, to tack on the leading space if it was found. But with that caveat, you basically could use regex in this case.
So a
Code: Select all
strX .replace(/<[^>]+>|&[^;]+;/g,'').replace(/ {2,}/g,' ').substring(0,1)==' ' ? ' ' : '') +strX .replace(/^(<[^>]+>|&[^;]+;|[^a-z])*/gi,'')Anyway, thanks again. Problem solved. (I'd just add that the more general case would be, \S, instead of, a-z)