Page 1 of 1
Remove HTML tags AND the text between
Posted: Thu Jun 15, 2006 2:56 am
by globalguide
Pimptastic | Please use Code: Select all
and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
I want to parse sections of HTML which may have embedded tables sometimes (and sometimes not).
I want to get rid of the stuff between the <table> tags (recursively).
e.g. if the text was
Code: Select all
Here is some text
<table cellspacing=20 style="font:verdana;">
<tr>
<td>
Here is an embedded table as well
<table><tr><td>embedded</td></tr></table>
</td>
</tr>
</table>
and here is some more...
I'd want to see the following afterwards:-
Here is some text and here is some more...
strip_tags gets rid of the html but leaves the stuff between
thanks
Scott
Pimptastic | Please use Code: Select all
and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
Posted: Thu Jun 15, 2006 4:26 am
by lettie
A clearer understanding of what you are trying to achieve may help with a solution.
What are you trying to display and why do you need it to display sometimes and not others?
Posted: Thu Jun 15, 2006 4:30 am
by anjanesh
Code: Select all
$text = strip_tags($HTML)
$text = str_replace(array(' '), '', $text); # Other than there are others like "e; etc - add whatever you want in the array
trim($text," \t\n\r\0\x0B\xA0");
Posted: Thu Jun 15, 2006 4:45 am
by globalguide
Perhaps I wasn't really clear. I am trying to parse text from a Wiki and produce simply the text from an article.
When you get down to parsing text from a Wiki, let's say it's a town on the Wikipedia. Sometimes people will have placed a nice image, a coat of arms and some standard info for the town in a table. Let's make one up called Newtown.
Here would be the Wikipedia article for Newtown:
Code: Select all
<table><tr><td colspan=2><img src="images/newtowncoatarms.png"><br>Newtown coat of arms</td></tr><tr><td>Population</td><td>1,234,345</td></tr><tr><td>Co-ordinates</td><td>4°N -1.34°W</td></tr></table>
Newtown is a town in Suffolk county, Virginia which lies in the Roanoor valley.
Using strip_tags, this is the output:
Code: Select all
Newtown coat of armsPopulation1,234,345Co-ordinates4°N -1.34°WNewtown is a town in Suffolk county, Virginia which lies in the Roanoor valley.
Here's what I what to achieve:-
Code: Select all
Newtown is a town in Suffolk county, Virginia which lies in the Roanoor valley.
i.e. it gets rid of the table and everything within the table.
maybe javascript?
Posted: Thu Jun 15, 2006 5:08 am
by bennythemink
hi,
what about using javascript to loop through the tables elements and setting their value to blank using the innerHTML property? then strip the tags once the javascript is completed.
???
Posted: Thu Jun 15, 2006 5:17 am
by lettie
The code below will split the text into seperate elements where it finds a closing </table> tag and place the results into an array. You then just display the array element you require.
Code: Select all
<?php
$spilttxt = '<table><tr><td colspan=2><img src="images/newtowncoatarms.png"><br>Newtown coat of arms</td></tr><tr><td>Population</td><td>1,234,345</td></tr><tr><td>Co-ordinates</td><td>4°N -1.34°W</td></tr></table>
Newtown is a town in Suffolk county, Virginia which lies in the Roanoor valley.';
$bodytxt = split('(</table>)', $spilttxt);
echo trim($bodytxt[1]);
?>
Posted: Thu Jun 15, 2006 5:28 am
by globalguide
Thanks for the replies so far.
Lettie: would that technique work in all cases?
This might happen (text before table):-
Code: Select all
Newtown is a town in Suffolk county, Virginia which lies in the Roanoor valley.
<table><tr><td colspan=2><img src="images/newtowncoatarms.png"><br>Newtown coat of arms</td></tr><tr><td>Population</td><td>1,234,345</td></tr><tr><td>Co-ordinates</td><td>4°N -1.34°W</td></tr></table>
Or this (table within text):-
Code: Select all
Newtown is a town in Suffolk county, Virginia.
<table><tr><td colspan=2><img src="images/newtowncoatarms.png"><br>Newtown coat of arms</td></tr><tr><td>Population</td><td>1,234,345</td></tr><tr><td>Co-ordinates</td><td>4°N -1.34°W</td></tr></table>
It lies in the Roanoor valley.
And indeed this (two tables):-
Code: Select all
Newtown is a town in Suffolk county, Virginia.
<table><tr><td colspan=2><img src="images/newtowncoatarms.png"><br>Newtown coat of arms</td></tr></table>
It lies in the Roanoor valley.
<table><tr><td>Population</td><td>1,234,345</td></tr><tr><td>Co-ordinates</td><td>4°N -1.34°W</td></tr></table>
Wiki article writers can put multiple tables in an article and indeed scatter embedded tables within an article.
The code would need to cater for all of these (automatically).
Thanks!
Posted: Thu Jun 15, 2006 5:58 am
by bennythemink
Pimptastic | Please use Code: Select all
and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
Lettie's method is the best i think, far better than having to use javascript. ul have to loop through the string, get the number of times the table tag occurs and get the data between the tables.
something like:
Code: Select all
//returns number of instances of table tag, not case sensitive.
$instances = substr_count($text,"<table>");
$tmpText = "";
for($i = 0; $i < $instances; $i ++)
{
//get position of first table.
$position = str_pos($text,"<table>");
if($position != 0)
{
$tmpText .= substr($text,0,$position);
$text = substr($position,strlen($text));
}
else if($position == 0)
{
//gets rid of <table>
$position = str_pos($text,"</table>");
$text = substr(0,$position);
}
else
{
if(!substr_count($text,"<table>") && strlen($text) > 0)
{
//if no <table> but characters still in the text, grab the last few characters.
$tmpText .= substr(0,strlen($text));
}
}
}
Pimptastic | Please use Code: Select all
and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
Posted: Thu Jun 15, 2006 9:28 am
by globalguide
The code didn't quite work but I'll keep trying - thanks for the help!