[closed] Parsing XML that contains HTML

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
ska
Forum Commoner
Posts: 41
Joined: Mon Sep 05, 2005 4:54 pm

[closed] Parsing XML that contains HTML

Post by ska »

**
Mods:

I've reposted this query with a more succiently worded query here:

viewtopic.php?p=240211

Please feel free to close this thread if required.

**


Hello. I am coding a PHP XML News system whereby a user can upload an XML file via a form and it will be parsed and added to a MySQL database. At the moment I'm just printing out to the screen and having problems parsing the data. One of the elements (the <body></body> XML element) contains HTML. For example:

Code: Select all

<Body>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Fusce vitae dolor. Maecenas ut felis id enim tempor ultrices. Nullam libero erat, vehicula ut, sollicitudin vitae, laoreet et, lectus.
<p>Morbi sit amet quam non urna molestie laoreet. Vestibulum tellus. Suspendisse potenti. Donec pretium. Vivamus erat nunc, rhoncus et, mattis ac, tristique eget, felis. Quisque non ipsum. Suspendisse potenti.
</p>
<h1>dolor sit amet</h1> Morbi vitae erat eu dolor mattis gravida. Fusce adipiscing. Nulla leo. Fusce nunc. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec gravida diam nec mauris.
<p>
<img src="http://www.url.com/image.jpg"/>
<img src="http://www.url.com/image.jpg"/>
</p>
			<h1> sollicitudin </h1>Praesent pharetra, nibh ut condimentum pharetra, massa eros ullamcorper nisl, fringilla commodo massa massa id magna. Phasellus sit amet augue. 		<p>
			<img src="http://www.url.com/image.jpg"/>
			</p>
			<h1> mauris </h1> Suspendisse elementum porta dui. Praesent in orci. Nulla sit amet risus. Praesent neque. Etiam leo nisl, ultricies non, pulvinar ut, adipiscing vitae, mauris. Cras et risus. In accumsan tristique odio. Nam rhoncus. Donec tincidunt auctor sem. Donec quis purus. Donec sagittis pellentesque nulla. Etiam pharetra molestie diam. Mauris egestas, enim at imperdiet sagittis, leo purus blandit quam, sed aliquet augue nisi in nisl. Aliquam placerat neque quis dolor.	
			<p>
				<img src="http://www.url.com/image.jpg"/>
				<img src="http://www.url.com/image.jpg"/>
				<img src="http://www.url.com/image.jpg"/>
			</p>
			<h1>Donec sagittis</h1>
Nulla sit amet risus. Praesent neque. Etiam leo nisl, ultricies non, pulvinar ut, adipiscing vitae, mauris. Cras et risus. In accumsan tristique odio.
</Body>
When printed to the screen, half of this is cut out and would end up something like this (source):

Code: Select all

pendisse elementum porta dui. Praesent in orci. Nulla sit amet risus. Praesent neque. Etiam leo nisl, ultricies non, pulvinar ut, adipiscing vitae, mauris. Cras et risus. In accumsan tristique odio. Nam rhoncus. Donec tincidunt auctor sem. Donec quis purus. Donec sagittis pellentesque nulla. Etiam pharetra molestie diam. Mauris egestas, enim at imperdiet sagittis, leo purus blandit quam, sed aliquet augue nisi in nisl. Aliquam placerat neque quis dolor.	
			<p>
				<img src="http://www.url.com/image.jpg"/>
				<img src="http://www.url.com/image.jpg"/>
				<img src="http://www.url.com/image.jpg"/>
			</p>
			<h1>Donec sagittis</h1>
Nulla sit amet risus. Praesent neque. Etiam leo nisl, ultricies non, pulvinar ut, adipiscing vitae, mauris. Cras et risus. In accumsan tristique odio.

This is my code:

Code: Select all

$fileatt = $_FILES['xmlfile']['tmp_name'];
			$fileatt_type = $_FILES['xmlfile']['type'];
			$fileatt_name = $_FILES['xmlfile']['name'];
			
			$file = fopen($fileatt,'rb');
			$data = fread($file,filesize($fileatt));
			preg_match_all ("/<NEWS>.*<\/NEWS>/Uis", $data, $matches);
			
			$matches[0][0]=str_replace("<Body>",'<BODY><![CDATA[',$matches[0][0]);
			$matches[0][0]=str_replace("</Body>",']]></BODY>',$matches[0][0]);
			
			$matches[0][0] = preg_replace("/(\r\n|\n|\r)/", "", $matches[0][0]);

			// Open the file and erase the contents if any
			$fp = fopen("temp.xml", "w");
			
			// Write the data to the file
			fwrite($fp, $matches[0][0]);
			
			// Close the file
			fclose($fp);
			
			if (!($fp=@fopen("temp.xml", "r"))) die ("Couldn't open XML.");
			$usercount=0;
			$userdata=array();
			$state='';
			
			function startElementHandler ($parser,$name,$attrib){
			global $usercount;
			global $userdata;
			global $state;
			
			switch ($name) {
			case $name=="NewsItem" : {
			$userdata[$usercount]["id"] = $attrib["id"];
			$userdata[$usercount]["date"] = $attrib["date"];
			break;
			}
			
			default : {$state=$name;break;}
			}
			}
			
			function endElementHandler ($parser,$name){
			global $usercount;
			global $userdata;
			global $state;
			$state='';
			if($name=="NEWSITEM") {$usercount++;}
			}
			
			function characterDataHandler ($parser, $data) {
			global $usercount;
			global $userdata;
			global $state;
			if (!$state) {return;}
			if ($state=="TITLE") { $userdata[$usercount]["Title"] = $data;}
			if ($state=="WHERE") { $userdata[$usercount]["Where"] = $data;}
			if ($state=="WHEN") { $userdata[$usercount]["When"] = $data;}
			if ($state=="INTRODUCTION") { $userdata[$usercount]["Introduction"] = $data;}
			if ($state=="BODY") { $userdata[$usercount]["Body"] = $data;}
			if ($state=="ABOUT") { $userdata[$usercount]["About"] = $data;}
			}
			if (!($xml_parser = xml_parser_create())) die("Couldn't create parser.");
			xml_set_element_handler( $xml_parser, "startElementHandler", "endElementHandler");
			xml_set_character_data_handler( $xml_parser, "characterDataHandler");
			
			while( $data = fread($fp, filesize("temp.xml"))){
			if(!xml_parse($xml_parser, $data, feof($fp))) {
			break;}}
			xml_parser_free($xml_parser);
			
			for ($i=0;$i<$usercount; $i++)
			{
				echo "ID: ".$userdata[$i]["id"]." Date: ".ucfirst($userdata[$i]["date"])."<br><br>";
		
				if ($userdata[$i]["Title"]) {echo "<h1>".$userdata[$i]["Title"]."</h1>";}
				if ($userdata[$i]["Where"]) {echo "Where: ".$userdata[$i]["Where"]."<br>";}
				if ($userdata[$i]["When"]) {echo "When: ".$userdata[$i]["When"]."<br>";}
				if ($userdata[$i]["Introduction"]) {echo $userdata[$i]["Introduction"]."<br>";}
				if ($userdata[$i]["Body"]) {echo $userdata[$i]["Body"]."<br>";}
				if ($userdata[$i]["About"]) {echo $userdata[$i]["About"]."<br>";}
   
			}

It is perhaps having trouble with the HTML? I throught perhaps adding the CDATA tag would help...? Any comments most appreciated.
Post Reply