Recursive HTML tabber

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Recursive HTML tabber

Post by superdezign »

I wrote a function that tabs HTML and traverses the HTML structure. I figured it'd be simple practice with recursion, except I've ran into a problem. It tabs the HTML fine, but if there are elements without closing tags (img, meta, br) or text AND elements with closing tags with the same parent tag, only the opened and closed tags remains. I believe I could fix it by using str_replace instead of equals, but the problem is determining what to put into str_replace.

Here is the class. All of the action is in FormatHTML().

Code: Select all

class CHTMLTabber
{
	const		__TAB__		= '    ';
	
	/**
	*	Shows tabbed html
	*	@param		str
	*	@param		return (boolean)
	*	@param		level
	*	@return		string or void
	*/
	public function TabHTML($str, $return = false, $level = 0)
	{
		$str		= self::FormatHTML($str);
		
		if(!$return)
		{
			echo '<pre>' . htmlspecialchars(CHTMLTabber::FormatHTML($str), $level) . '</pre>';
			return;
		}
		
		return $str;
	}
	
	/**
	*	Add tabs to the HTML str
	*	@param		str
	*	@return		string
	*/
	protected function FormatHTML($str, $level = 0)
	{
		$str				= self::ClearNewlines($str);
		
		preg_match_all('|
			
			(
				<					# open HTML tag
				(
					(?<!/)			# does not start with a slash
					[^>\s]+			# all contents up to a space or end of tag
				)
				.*?					# all contents to end of tag
				>					# close HTML tag
			)
			(
				.*?					# all contents to ending tag
			)
			(
				</					# open ending HTML tag
				(
					\2				# same as content of pattern 2 (HTML tag name)
				)
				>					# close ending HTML tag
			)
			
			|xsi', $str, $matches);
		
		/**
		*	REGEX EXPLANATION:
		*	
		*	$matches[0]			= entire HTML tag and content
		*		i.e. <html>content</html>
		*
		*	$matches[1]			= starting HTML tag
		*		i.e. <html>
		*
		*	$matches[2]			= HTML tag name
		*		i.e. html
		*
		*	$matches[3]			= contents
		*		i.e. content
		*
		*	$matches[4]			= closing HTML tag
		*		i.e. </html>
		*
		*	$matches[5]			= closing HTML tag name (currently unused)
		*		i.e. html
		*/
		
		// Create tabs for this level
		for($i = 0, $tabs = ''; $i < $level; $i++)
		{
			$tabs			.= self::__TAB__;
		}
		
		// Format HTML
		if(!empty($matches[3]))
		{
			$str			= '';
			
			foreach($matches[3] as $id => $content)
			{
				$content	= trim(self::FormatHTML($content, $level + 1));
				
				// Don't add extra newlines or tabs
				if($id >= 1)
				{
					$str	.= "\n" . $tabs;
				}
				
				$str		.= $matches[1][$id] . "\n";
				
				// Don't output anything for empty contents
				if(!empty($content))
				{
					$str	.= $tabs . self::__TAB__ . $content . "\n";
				}
				
				$str		.= $tabs . $matches[4][$id];
			}
		}
		
		return $str;
	}
	
	/**
	*	Removes newlines from a string
	*	@param		str
	*	@return		string
	*/
	protected function ClearNewlines($str)
	{
		return preg_replace('#(\r\n|\n)#s', '', $str);
	}
};
It's possible that I'm going in the wrong direction with this, so I'm open to any and all suggestions. The regex is describe in a comment so that if you don't feel like deciphering it, it's already there.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Have you looked at HTMLPurifier?
Post Reply