PHP XML parsing question

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
ska
Forum Commoner
Posts: 41
Joined: Mon Sep 05, 2005 4:54 pm

PHP XML parsing question

Post by ska »

Hi there,

Further to a previous post, I'm just simplying my question here. I am coding a PHP news system which you can upload an XML to via a webform and then extract to a MySQL database. Getting the file via a form is fine as will be writing to the database when I've got that far. The bit I can't do is parse the XML file. It looks a bit like this:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XML Spy v4.4 U (http://www.xmlspy.com) -->
<?xml-stylesheet type="text/xsl" href="C:\Documents and Settings\News.xsl"?>
<News>
	<NewsItem id="" date="">
		<Title></Title>
		<Where></Where>
		<When></When>
		<Introduction></Introduction>
		<Body></Body>
	</NewsItem>

	<NewsItem id="" date="">
		<Title></Title>
		<Where></Where>
		<When></When>
		<Introduction></Introduction>
		<Body></Body>
	</NewsItem>
	
</News>
So assuming a user has attached this to a webform and clicked 'submit', how do I read out these values? I have written a script, but it doesn't work very well - I think it gets confused as the <body> tag can contain HTML. Here is my current script:

Code: Select all

$fileatt = $_FILES['xmlfile']['tmp_name']; 
            $fileatt_type = $_FILES['xmlfile']['type']; 
            $fileatt_name = $_FILES['xmlfile']['name']; 
             
            $file = fopen($fileatt,'rb'); 
            $data = fread($file,filesize($fileatt)); 
            preg_match_all ("/<NEWS>.*<\/NEWS>/Uis", $data, $matches); 
             
            $matches[0][0]=str_replace("<Body>",'<BODY><![CDATA[',$matches[0][0]); 
            $matches[0][0]=str_replace("</Body>",']]></BODY>',$matches[0][0]); 
             
            $matches[0][0] = preg_replace("/(\r\n|\n|\r)/", "", $matches[0][0]); 

            // Open the file and erase the contents if any 
            $fp = fopen("temp.xml", "w"); 
             
            // Write the data to the file 
            fwrite($fp, $matches[0][0]); 
             
            // Close the file 
            fclose($fp); 
             
            if (!($fp=@fopen("temp.xml", "r"))) die ("Couldn't open XML."); 
            $usercount=0; 
            $userdata=array(); 
            $state=''; 
             
            function startElementHandler ($parser,$name,$attrib){ 
            global $usercount; 
            global $userdata; 
            global $state; 
             
            switch ($name) { 
            case $name=="NewsItem" : { 
            $userdata[$usercount]["id"] = $attrib["id"]; 
            $userdata[$usercount]["date"] = $attrib["date"]; 
            break; 
            } 
             
            default : {$state=$name;break;} 
            } 
            } 
             
            function endElementHandler ($parser,$name){ 
            global $usercount; 
            global $userdata; 
            global $state; 
            $state=''; 
            if($name=="NEWSITEM") {$usercount++;} 
            } 
             
            function characterDataHandler ($parser, $data) { 
            global $usercount; 
            global $userdata; 
            global $state; 
            if (!$state) {return;} 
            if ($state=="TITLE") { $userdata[$usercount]["Title"] = $data;} 
            if ($state=="WHERE") { $userdata[$usercount]["Where"] = $data;} 
            if ($state=="WHEN") { $userdata[$usercount]["When"] = $data;} 
            if ($state=="INTRODUCTION") { $userdata[$usercount]["Introduction"] = $data;} 
            if ($state=="BODY") { $userdata[$usercount]["Body"] = $data;} 
            if ($state=="ABOUT") { $userdata[$usercount]["About"] = $data;} 
            } 
            if (!($xml_parser = xml_parser_create())) die("Couldn't create parser."); 
            xml_set_element_handler( $xml_parser, "startElementHandler", "endElementHandler"); 
            xml_set_character_data_handler( $xml_parser, "characterDataHandler"); 
             
            while( $data = fread($fp, filesize("temp.xml"))){ 
            if(!xml_parse($xml_parser, $data, feof($fp))) { 
            break;}} 
            xml_parser_free($xml_parser); 
             
            for ($i=0;$i<$usercount; $i++) 
            { 
                echo "ID: ".$userdata[$i]["id"]." Date: ".ucfirst($userdata[$i]["date"])."<br><br>"; 
         
                if ($userdata[$i]["Title"]) {echo "<h1>".$userdata[$i]["Title"]."</h1>";} 
                if ($userdata[$i]["Where"]) {echo "Where: ".$userdata[$i]["Where"]."<br>";} 
                if ($userdata[$i]["When"]) {echo "When: ".$userdata[$i]["When"]."<br>";} 
                if ($userdata[$i]["Introduction"]) {echo $userdata[$i]["Introduction"]."<br>";} 
                if ($userdata[$i]["Body"]) {echo $userdata[$i]["Body"]."<br>";} 
                if ($userdata[$i]["About"]) {echo $userdata[$i]["About"]."<br>";} 
    
            }

Any help or comments on this would be most appreciated. Many thanks.
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

ska
Forum Commoner
Posts: 41
Joined: Mon Sep 05, 2005 4:54 pm

Post by ska »

Wicked, thanks patrikG. I'll have a look at that thread and your code and see how I get on.
ska
Forum Commoner
Posts: 41
Joined: Mon Sep 05, 2005 4:54 pm

how do you echo out...

Post by ska »

Hi again,

Just to clarify (I'm not great with classes yet) - to read out of the array returned in the XML parser package (PatrikG), what is the syntax? Is the result the array (i.e $xml->result[0])?

So for example, I would declare

Code: Select all

require_once('class_xml_parser.inc.php'); 
$xml = new parser("temp.xml","NEWS","TITLE");
And then to read out all the titles I would...?

thanks.
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

do a

Code: Select all

var_dump($xml->result);
The resulting array is in $xml->result
ska
Forum Commoner
Posts: 41
Joined: Mon Sep 05, 2005 4:54 pm

reading out of the array

Post by ska »

Hi again, thanks for the pointer. I've not read stuff out of a multidimensional array like this before. How do you address something in it? My var_dump returns;

Code: Select all

array(1) { [0]=> array(9) { ["NEWS"]=> string(1) "	" ["NEWSITEM"]=> string(2) "	 " ["TITLE"]=> string(2) "	 " ["WHERE"]=> string(2) "	 " ["WHEN"]=> string(2) "	 " ["INTRODUCTION"]=> string(2) "	 " ["BODY"]=> string(2) "	 " ["ABOUT"]=> string(1) "	" ["P"]=> string(1) "	" } }
So if I wanted to cycle through the results addressing each one in turn (so that I can read into a database), how would I address one of these. Something like: echo $xml->result[0][0]; ?

Thanks.
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

post your XML, please
ska
Forum Commoner
Posts: 41
Joined: Mon Sep 05, 2005 4:54 pm

Post by ska »

Sure, it's along the lines of:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XML Spy v4.4 U (http://www.xmlspy.com) -->
<?xml-stylesheet type="text/xsl" href="C:\Documents and Settings\My Documents\News.xsl"?>
<News>
	<NewsItem id="code" date="11/30/05">
		<Title>Ipsum</Title>
		<Where>Lorem</Where>
		<When>Lorem</When>
		<Introduction>Lorem ipsum dolor sit amet, consectetuer </Introduction>
		<Body>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</p>
<h1>Lorem ipsum dolor</h1>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
<p>
<img src="http://www.url.com/image.jpg"/>
<img src="http://www.url.com/image.jpg"/>
</p>
			<h1>Lorem ipsum dolor</h1>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
			<p>
			<img src="http://www.url.com/image.jpg"/>
			</p>
			<h1>Lorem ipsum dolor</h1>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
			<p>
				<img src="http://www.url.com/image.jpg"/>
				<img src="http://www.url.com/image.jpg"/>
				<img src="http://www.url.com/image.jpg"/>
			</p>
			<h1>Lorem ipsum dolor</h1>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
		</Body>
	</NewsItem>
	<NewsItem id="code" date="10/5/05">
		<Title>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</Title>
		<Where>Lorem</Where>
		<When>Lorem</When>
		<Introduction>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</Introduction>
		<Body>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.

				<h1>Lorem</h1>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
 
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</p>
			<h1>Lorem ipsum dolor sit amet</h1>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</p>
			<h1>Lorem ipsum dolor sit amet</h1>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
				<h1>Lorem ipsum dolor sit amet</h1>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
				<h1>Lorem ipsum dolor sit amet</h1>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
				<h1>Lorem ipsum dolor sit amet</h1>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
		</Body>
	</NewsItem>
	<NewsItem id="code" date="10/4/05">
		<Title>Lorem ipsum dolor sit amet</Title>
		<Where>Lorem ipsum dolor sit amet</Where>
		<When>Lorem ipsum dolor sit amet</When>
		<Introduction>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</Introduction>
		<Body>
	Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
	</Body>
	</NewsItem>
	<NewsItem id="code" date="7/20/05">
		<Title>Lorem ipsum dolor sit amet</Title>
		<Where>Lorem ipsum dolor sit amet</Where>
		<When>Lorem ipsum dolor sit amet</When>
		<Introduction>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</Introduction>
		<Body>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
<p>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
</p>
		</Body>
		<About who="code" web="www.url.co.uk" tel="23443242" email="email@email.co.uk">Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
</About>
	</NewsItem>
	<NewsItem id="code" date="4/20/05">
		<Title>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</Title>
		<Where>Lorem ipsum dolor sit amet</Where>
		<When>Lorem ipsum dolor sit amet</When>
		<Introduction>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</Introduction>
		<Body>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
<p>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
</p>
			<p>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
</p>
		</Body>
		<About who="code" web="www.url.com" tel="23423423" email="email@email.com">
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
<p>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
</p>
		</About>
	</NewsItem>
	<NewsItem id="code" date="3/10/05">
		<Title>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</Title>
		<Where>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</Where>
		<When>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</When>
		<Introduction>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.</Introduction>
		<Body>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
</Body>
		<About who="code" web="www.url.com" tel="234234" email="email@email.com">
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
<p>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam at dui. Sed mi. Donec imperdiet laoreet nisi.
</p>
		</About>
	</NewsItem>
</News>
Except a bit longer and with real text... is that any good?
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

Ah, the parser gets confused by the HTML-tags in there. Hence, it's not sufficient to parse that XML - you need a different, HTML-tolerant parser.
ska
Forum Commoner
Posts: 41
Joined: Mon Sep 05, 2005 4:54 pm

Post by ska »

I added CDATA sections to try and help things but don't really know what I'm doing with them...

Code: Select all

$fileatt      = $_FILES['xmlfile']['tmp_name'];
$fileatt_type = $_FILES['xmlfile']['type'];
$fileatt_name = $_FILES['xmlfile']['name'];
			
$file = fopen($fileatt,'rb');
$data = fread($file,filesize($fileatt));
preg_match_all ("/<NEWS>.*<\/NEWS>/Uis", $data, $matches);

$matches[0][0]=str_replace("<Body>",'<BODY><![CDATA[',$matches[0][0]);
$matches[0][0]=str_replace("</Body>",']]></BODY>',$matches[0][0]);
			
$matches[0][0] = preg_replace("/(\r\n|\n|\r)/", "", $matches[0][0]);

$fp = fopen("temp.xml", "w");
			
fwrite($fp, $matches[0][0]);
			
fclose($fp);
Any suggestions on an HTML tolerant XML parser...? If it comes to it I could just write a regex function I guess.

Thanks for your help so far!
ska
Forum Commoner
Posts: 41
Joined: Mon Sep 05, 2005 4:54 pm

regex

Post by ska »

I've written a regex to deal with the XML. I've included this below in case it's useful to anyone. If anyone has a neater way of doing this, then let me know though!

Code: Select all

$fileatt      = $_FILES['xmlfile']['tmp_name'];
			$fileatt_type = $_FILES['xmlfile']['type'];
			$fileatt_name = $_FILES['xmlfile']['name'];
			
			$file = fopen($fileatt,'rb');
			$data = fread($file,filesize($fileatt));
			preg_match_all ("/<NewsItem.*<\/NewsItem>/Uis", $data, $matches);

			for ($i=0; $i<count($matches[0]); $i++)
			{
				preg_match_all ('/<NewsItem.*">/Uis', $matches[0][$i], $NewsItem[$i]);
				preg_match_all ("/<Title>.*<\/Title>/Uis", $matches[0][$i], $Title[$i]);
				preg_match_all ("/<Where>.*<\/Where>/Uis", $matches[0][$i], $Where[$i]);
				preg_match_all ("/<When>.*<\/When>/Uis", $matches[0][$i], $When[$i]);
				preg_match_all ("/<Introduction>.*<\/Introduction>/Uis", $matches[0][$i], $Introduction[$i]);
				preg_match_all ("/<Body>.*<\/Body>/Uis", $matches[0][$i], $Body[$i]);
				preg_match_all ("/<About.*<\/About>/Uis", $matches[0][$i], $About[$i]);
				
			}
			
			for ($i=0; $i<count($matches[0]); $i++)
			{

				preg_match_all ('/date=".*"/Uis', $NewsItem[$i][0][0], $datestamp);
				$datestamp[0][0]=str_replace('"',"",$datestamp[0][0]);
				$datestamp[0][0]=str_replace('date=',"",$datestamp[0][0]);
				echo $datestamp[0][0]."<br />";
				
				
				$Title[$i][0][0]=str_replace("<Title>","",$Title[$i][0][0]);
				$Title[$i][0][0]=str_replace("</Title>","",$Title[$i][0][0]);
				echo "<h1>".$Title[$i][0][0]."</h1>";
				
				$Where[$i][0][0]=str_replace("<Where>","",$Where[$i][0][0]);
				$Where[$i][0][0]=str_replace("</Where>","",$Where[$i][0][0]);
				echo "<b>".$Where[$i][0][0]."</b><br/>";
				
				$When[$i][0][0]=str_replace("<When>","",$When[$i][0][0]);
				$When[$i][0][0]=str_replace("</When>","",$When[$i][0][0]);
				echo "<em>".$When[$i][0][0]."</em><br /><br />";
				
				$Introduction[$i][0][0]=str_replace("<Introduction>","",$Introduction[$i][0][0]);
				$Introduction[$i][0][0]=str_replace("</Introduction>","",$Introduction[$i][0][0]);
				echo "<em>".$Introduction[$i][0][0]."</em><br /><br />";
				
				$Body[$i][0][0]=str_replace("<Body>","",$Body[$i][0][0]);
				$Body[$i][0][0]=str_replace("</Body>","",$Body[$i][0][0]);
				echo $Body[$i][0][0];
				
				// get full <About ..... > section
				preg_match_all ('/<About.*">/Uis', $matches[0][$i], $Aboutdesc);
				
				// get web= section
				preg_match_all ('/web=".*"/Uis', $Aboutdesc[0][0], $web);
				
				$web[0][0]=str_replace('"',"",$web[0][0]);
				$web[0][0]=str_replace('web=',"",$web[0][0]);

				$About[$i][0][0]=str_replace($Aboutdesc[0][0],"",$About[$i][0][0]);
				$About[$i][0][0]=str_replace("</About>","",$About[$i][0][0]);

				echo "<br /><br />".$About[$i][0][0];
				echo "<br /><a href='http://".$web[0][0]."'>".$web[0][0]."</a><br /><br /><br />";
Post Reply