Strip leading HTML
Posted: Sat Aug 11, 2007 2:19 pm
I realize that Regex can't work on nested tags. But I needed to strip the HTML and all but one space, if any space(s) were present, from only the front of a string (say the sort retrieved by innerHTML). So there a complete string of HTML tags which are followed by the substring where I want to start, which substring I'll call the, text.
I tried the pattern - /<*[^>]*>/g,'' - but it removed all tags, even after the text began. I removed character entities the same way, with - /&*[^;]*;/g,'' - and it also removed them everywhere.
I ultimately had to resort to a function that I used, elsewhere, to essentially find the start of the text. But I wondered if a regex could still be used? It's the end of the text that matters if one is worried about nesting. Some function would probably have to be called that would quickly parse the hierarchy as a simple stack. But to find the start of the text, any HTML nesting wouldn't matter. How could I get the pattern - /<*[^>]*>|&*[^;]*;/g,'' - to stop at the first alphabet or digit it encounters outside of a tag or entity.
I tried the pattern - /<*[^>]*>/g,'' - but it removed all tags, even after the text began. I removed character entities the same way, with - /&*[^;]*;/g,'' - and it also removed them everywhere.
I ultimately had to resort to a function that I used, elsewhere, to essentially find the start of the text. But I wondered if a regex could still be used? It's the end of the text that matters if one is worried about nesting. Some function would probably have to be called that would quickly parse the hierarchy as a simple stack. But to find the start of the text, any HTML nesting wouldn't matter. How could I get the pattern - /<*[^>]*>|&*[^;]*;/g,'' - to stop at the first alphabet or digit it encounters outside of a tag or entity.