Remove HTML tags AND the text between

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
globalguide
Forum Newbie
Posts: 4
Joined: Thu Jun 15, 2006 2:50 am

Remove HTML tags AND the text between

Post by globalguide »

Pimptastic | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]


I want to parse sections of HTML which may have embedded tables sometimes (and sometimes not).

I want to get rid of the stuff between the <table> tags (recursively).

e.g. if the text was

Code: Select all

Here is some text 
<table cellspacing=20 style="font:verdana;">
<tr>
<td>
Here is an embedded table as well
<table><tr><td>embedded</td></tr></table>
</td>
</tr>
</table>
and here is some more...
I'd want to see the following afterwards:-

Here is some text and here is some more...

strip_tags gets rid of the html but leaves the stuff between

thanks

Scott


Pimptastic | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
lettie
Forum Newbie
Posts: 15
Joined: Fri Jan 28, 2005 6:57 am

Post by lettie »

A clearer understanding of what you are trying to achieve may help with a solution.

What are you trying to display and why do you need it to display sometimes and not others?
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Post by anjanesh »

Code: Select all

$text = strip_tags($HTML)
$text = str_replace(array('&nbsp;'), '', $text); # Other than &nbsp; there are others like &quote; etc - add whatever you want in the array
trim($text," \t\n\r\0\x0B\xA0");
globalguide
Forum Newbie
Posts: 4
Joined: Thu Jun 15, 2006 2:50 am

Post by globalguide »

Perhaps I wasn't really clear. I am trying to parse text from a Wiki and produce simply the text from an article.

When you get down to parsing text from a Wiki, let's say it's a town on the Wikipedia. Sometimes people will have placed a nice image, a coat of arms and some standard info for the town in a table. Let's make one up called Newtown.

Here would be the Wikipedia article for Newtown:

Code: Select all

<table><tr><td colspan=2><img src="images/newtowncoatarms.png"><br>Newtown coat of arms</td></tr><tr><td>Population</td><td>1,234,345</td></tr><tr><td>Co-ordinates</td><td>4&deg;N -1.34&deg;W</td></tr></table>

Newtown is a town in Suffolk county, Virginia which lies in the Roanoor valley.
Using strip_tags, this is the output:

Code: Select all

Newtown coat of armsPopulation1,234,345Co-ordinates4&deg;N -1.34&deg;WNewtown is a town in Suffolk county, Virginia which lies in the Roanoor valley.
Here's what I what to achieve:-

Code: Select all

Newtown is a town in Suffolk county, Virginia which lies in the Roanoor valley.
i.e. it gets rid of the table and everything within the table.
bennythemink
Forum Newbie
Posts: 16
Joined: Thu Jun 15, 2006 4:32 am

maybe javascript?

Post by bennythemink »

hi,

what about using javascript to loop through the tables elements and setting their value to blank using the innerHTML property? then strip the tags once the javascript is completed.

???
lettie
Forum Newbie
Posts: 15
Joined: Fri Jan 28, 2005 6:57 am

Post by lettie »

The code below will split the text into seperate elements where it finds a closing </table> tag and place the results into an array. You then just display the array element you require.

Code: Select all

<?php
	$spilttxt = '<table><tr><td colspan=2><img src="images/newtowncoatarms.png"><br>Newtown coat of arms</td></tr><tr><td>Population</td><td>1,234,345</td></tr><tr><td>Co-ordinates</td><td>4&deg;N -1.34&deg;W</td></tr></table> 

Newtown is a town in Suffolk county, Virginia which lies in the Roanoor valley.';
	$bodytxt = split('(</table>)', $spilttxt);
	echo trim($bodytxt[1]);
?>
globalguide
Forum Newbie
Posts: 4
Joined: Thu Jun 15, 2006 2:50 am

Post by globalguide »

Thanks for the replies so far.

Lettie: would that technique work in all cases?

This might happen (text before table):-

Code: Select all

Newtown is a town in Suffolk county, Virginia which lies in the Roanoor valley.

<table><tr><td colspan=2><img src="images/newtowncoatarms.png"><br>Newtown coat of arms</td></tr><tr><td>Population</td><td>1,234,345</td></tr><tr><td>Co-ordinates</td><td>4&deg;N -1.34&deg;W</td></tr></table>
Or this (table within text):-

Code: Select all

Newtown is a town in Suffolk county, Virginia. 

<table><tr><td colspan=2><img src="images/newtowncoatarms.png"><br>Newtown coat of arms</td></tr><tr><td>Population</td><td>1,234,345</td></tr><tr><td>Co-ordinates</td><td>4&deg;N -1.34&deg;W</td></tr></table>

It lies in the Roanoor valley.
And indeed this (two tables):-

Code: Select all

Newtown is a town in Suffolk county, Virginia. 

<table><tr><td colspan=2><img src="images/newtowncoatarms.png"><br>Newtown coat of arms</td></tr></table>

It lies in the Roanoor valley.

<table><tr><td>Population</td><td>1,234,345</td></tr><tr><td>Co-ordinates</td><td>4&deg;N -1.34&deg;W</td></tr></table>
Wiki article writers can put multiple tables in an article and indeed scatter embedded tables within an article.

The code would need to cater for all of these (automatically).

Thanks!
bennythemink
Forum Newbie
Posts: 16
Joined: Thu Jun 15, 2006 4:32 am

Post by bennythemink »

Pimptastic | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]


Lettie's method is the best i think, far better than having to use javascript. ul have to loop through the string, get the number of times the table tag occurs and get the data between the tables.

something like:

Code: Select all

//returns number of instances of table tag, not case sensitive.

$instances = substr_count($text,"<table>");

$tmpText = "";

for($i = 0; $i < $instances; $i ++)
{
   //get position of first table.
   $position = str_pos($text,"<table>");
   
   if($position != 0) 
   {
      $tmpText .= substr($text,0,$position);
      $text = substr($position,strlen($text));
   }
   else  if($position == 0)
   { 
      //gets rid of <table>
      $position = str_pos($text,"</table>");
      $text = substr(0,$position);
   }
   else
   {
       if(!substr_count($text,"<table>") && strlen($text) > 0)
      {
          //if no <table> but characters still in the text, grab the last few characters.
         $tmpText .= substr(0,strlen($text));
       }
   }

}

Pimptastic | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
globalguide
Forum Newbie
Posts: 4
Joined: Thu Jun 15, 2006 2:50 am

Post by globalguide »

The code didn't quite work but I'll keep trying - thanks for the help!
Post Reply