Page 1 of 1

xml parsing script

Posted: Tue Sep 16, 2003 2:44 pm
by xisle
parsing xml feed into mysql --
greets, I am looking for ideas on a more efficient way to handle this. Worked with line by line parsing but that wasn't as consistent with start and end tags. I ended up writing it like this, but I prefer to stay away from dumping files to strings..
thanks!

Code: Select all

<?php
// dbFieldname => tagname
$tagstofind = array("headline" => "hl1",
					"source" => "distributor",
					"entrydate" => "dateline",
					"location" => "location",
					"body" => "body.content");

$newsdirectory="./xmlsource/";

if(!($files = opendir("$newsdirectory"))) {
   print("directory could not be opened.");
   exit;    
}

while($filename = readdir($files)) {

  if(stristr($filename, ".xml")){
	 if(!($filedata=file($newsdirectory.$filename))){
		print"couldn't open the file.<p>";
	 }
	 else {
		
		$filestring=implode($filedata,"");
		
		foreach ($tagstofind as $field => $tagname) {
			$tmpstring = "dummytoken".$filestring;
			$tmpstring = str_replace("<{$tagname}>", "‡‡", $tmpstring);
			$tmpstring = str_replace("</{$tagname}>", "‡‡", $tmpstring);
			
			$token = strtok($tmpstring, "‡‡");
			$token = strtok("‡‡");
		  
			if($token != ""){
			   $found[$field].= $token;
			}
			unset($token);
			unset($tmpstring);
		}		
		
		$count=count($found);
		$a=0;
		
		foreach ($found as $fieldname => $value) {
		  $value=trim($value);
		  $insertfields.=$fieldname;
		  $insertvals.="'".addslashes($value)."'";
		  
		  if($a < ($count-1)){
			   $insertfields.=",";
			   $insertvals.=",";
			   $a++;
		  }
		  #print "$fieldname: $value<hr>";			  
		}
		
		$query = "INSERT INTO feed (".$insertfields.") VALUES (".$insertvals.")";
		echo $query;
		
	 }
  }
  
}

?>

Posted: Tue Sep 16, 2003 3:55 pm
by m3rajk
can you clarify what exactly you're trying to do???

Posted: Tue Sep 16, 2003 4:14 pm
by SantaGhost
this might give you some ideas

Code: Select all

<?php
###############################################
## XML PARSER 1.0                            ##
## SantaGhost                                ##
## beheerder@email.com                       ##
##                                           ##
## thanks to sitepoint.com                   ##
###############################################
class xml_parser_data{
var $file, $toptag, $parsedata, $output;

    function startElement($parser, $tagName, $attrs) {
       if ($this->parsedata["insideitem"]) {
           $this->parsedata["tag"] = $tagName;
       } elseif ($tagName == $this->toptag) {
           $this->parsedata["insideitem"] = true;
       }
   }//end of function startElement

   function endElement($parser, $tagName) {
       if ($tagName == $this->toptag) {
           $this->parsedata["insideitem"] = false;
       }
   }//end of function endElement

   function characterData($parser, $data) {
       if ($this->parsedata["insideitem"]) {
		   if($this->parsedata["tag"]){
				$this->output[strtolower($this->parsedata["tag"])] .= $data;
			}
       }
   }//end of function characterData
}//end of class xml_parser_data

class xml_parser{

	function parse($file,$toptag){
		$xml_parser_data = new xml_parser_data();
		$xml_parser_data->file = $file;
		$xml_parser_data->toptag = strtoupper($toptag);
		$xml_parser_data->parsedata["insideitem"] = false;
		$xml_parser_data->parsedata["tag"] = "";		
				
		$xml_parser = xml_parser_create();
		xml_set_object($xml_parser,&$xml_parser_data);
		xml_set_element_handler($xml_parser, "startElement", "endElement");
		xml_set_character_data_handler($xml_parser, "characterData");
		$fp = fopen($xml_parser_data->file,"r")
		   or die("Error reading XML data. please contact your webmaster");
		while ($data = fread($fp, 4096))
		   xml_parse($xml_parser, $data, feof($fp))
			   or die(sprintf("XML error: %s at line %d",  
				   xml_error_string(xml_get_error_code($xml_parser)),  
				   xml_get_current_line_number($xml_parser)));
		fclose($fp);
		xml_parser_free($xml_parser);
		return $xml_parser_data->output;
	}//end of function xml_parser
}//end of class xml_parser
$xml_parser = new xml_parser();
?>

Posted: Tue Sep 16, 2003 4:17 pm
by volka
you might want to look into some already existing rss-classes.
e.g. http://magpierss.sourceforge.net/

Posted: Wed Sep 17, 2003 10:15 am
by xisle
thanks for the info..
the script looks for specific xml tags in a news feed, strips the tags and inserts the data into the appropriate table/field. I have rss feed parsers, but i need something a bit more targeted where I can declare the data fields and tagnames -- because most of the tags are garbage and I am not using a specific DTD or XSLT for output.

Is this stupid? 8O