Page 1 of 1

Trying to remove any HTML head and body information

Posted: Sun Aug 13, 2006 7:36 pm
by hismightiness
I do not develop in PHP as often as most of you, so I am hoping that someone might help me with this, as I am not sure that this is the best way to go about it. I am trying to remove ALL information in the HEAD of the HTML page, as well as the BODY tag and closing BODY & HTML tags. Here is what I am attempting to do:

Code: Select all

# $FileText is the HTML content loaded into a variable

	# remove any HTML body and header information for rewrite
	if(ereg("(<body[^>])",$FileText,$BodyMatch)){
		$arrFileText = explode($BodyMatch[1],$FileText);
		$FileText = str_replace('</body>','',$arrFileText[2]);
		$FileText = str_replace('</html>','',$FileText);
	}

	# here I would use $FileText to do the rest of my needed work
Is this a consistent and/or viable approach? This has not gone through extensive testing yet, but seems to be working so far.

Posted: Mon Aug 14, 2006 9:39 am
by guanxin
does this code do the same work?

Code: Select all

$str = "<html> 
<head><title>Hello, world!</title></head> 
<body> 
This is the text to save. 
</body> 
</html>";
$str = preg_replace("/.*?<body>(.*?)<\/body>.*/si", "\\1", $str);

Posted: Mon Aug 14, 2006 7:33 pm
by Ambush Commander
Almost. But what if the body tag has attributes? ;-)

Posted: Wed Aug 16, 2006 12:15 am
by guanxin
Ambush Commander wrote:Almost. But what if the body tag has attributes? ;-)
?

Code: Select all

/.*?<body[^>]*?>(.*)<\/body>.*/si

Posted: Wed Aug 16, 2006 7:38 am
by Ambush Commander
Yep. Although personally speaking, I'd use preg_match.