XML Parse Help

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
hanji
Forum Commoner
Posts: 46
Joined: Fri Apr 29, 2005 3:23 pm

XML Parse Help

Post by hanji »

Hello

I'm trying to parse an XML file, but I'm having some trouble. I'm pretty much a newb when it comes to XML, so any help is greatly appreciated.

Below is a snip of my XML file. It is a bibliography export from a program called EndNote. I need to pull values from this XML file and dump them into a MySQL database, so I'm trying to understand how to reference these values.

Code: Select all

<?xml version=&quote;1.0&quote; encoding=&quote;UTF-8&quote; ?> 
<xml>
  <records>
    <record>
      <database name=&quote;test.enl&quote; path=&quote;E:\test\bibliography\test.enl&quote;>test.enl</database> 
      <source-app name=&quote;EndNote&quote; version=&quote;8.0&quote;>EndNote</source-app> 
      <rec-number>3</rec-number> 
      <ref-type name=&quote;Online Multimedia&quote;>48</ref-type> 
      <contributors>
         <authors>
            <author>
               <style face=&quote;normal&quote; font=&quote;default&quote; size=&quote;100%&quote;>Los Alamos National Labs Chemistry Division</style> 
            </author>
         </authors>
      </contributors>
      <titles>
         <title>
            <style face=&quote;normal&quote; font=&quote;default&quote; size=&quote;100%&quote;>test</style> 
         </title>
      </titles>
      <dates>
         <year>
            <style face=&quote;normal&quote; font=&quote;default&quote; size=&quote;100%&quote;>2003</style> 
         </year>
         <pub-dates>
             <date>
                <style face=&quote;normal&quote; font=&quote;default&quote; size=&quote;100%&quote;>April 12, 2005</style> 
             </date>
         </pub-dates>
      </dates>
      <urls>
         <related-urls>
            <url>
               <style face=&quote;normal&quote; font=&quote;default&quote; size=&quote;100%&quote;>http://test.lanl.gov/periodic/test/1.html</style> 
            </url>
         </related-urls>
      </urls>
    </record>
  </records>
</xml>
So the problem I'm running into, is that I can't seem to nail a few of these elements. Basically, any tags without any attributes are 'lost' to my parser??. contributors, authors, author, titles, title, etc. So what I get is :

Code: Select all

XML
RECORDS
RECORD
DATABASE
SOURCE-APP
REC-NUMBER
REF-TYPE
CONTRIBUTORS
AUTHORS
AUTHOR
STYLE
1 =>Los Alamos National Labs Chemistry Division
TITLES
TITLE
STYLE
2 =>test
DATES
YEAR
STYLE
3 =>2003
PUB-DATES
DATE
STYLE
4 =>April 12, 2005
URLS
RELATED-URLS
URL
STYLE
5 =>http://test.lanl.gov/test/elements/1.html
The items I need to pull are basically...
Author
Title
Year
Pub-Dates
Related-URLs

Here is my parser code below.. it's rough...

Code: Select all

<?
$variable_data			= array();
$current_tag			= "";
$variable_label			= "";
$variable_type			= "";

function start_element($parser, $element_name, $element_attr) {
	global $current_tag, $variable_label, $variable_type,$count;
	$current_tag		= $element_name;
	switch($element_name){
		case "DATABASE":
			$count			= 1;
			break;
	}
	echo $element_name."<br>";
}

function end_element($parser, $element_name) {
	global $current_tag, $variable_type;
	$current_tag		= "";
	$variable_type		= "";
}

function character_data($parser, $data) {
	global $current_tag, $variable_label,$count;
	switch($current_tag){
		case "STYLE":
			echo $count." =>".$data."<br>";
			$count++;
			break;
	}
}

$file 		= "biblio042505.xml";
$parser		= xml_parser_create();
xml_set_element_handler($parser, "start_element", "end_element");
xml_set_character_data_handler($parser, "character_data");

$fp			= fopen($file,'r');
while($data	= fread($fp,4096)){
	$count	= 1;
	xml_parse($parser,str_replace("&apos;","'",$data),feof($fp));
}
xml_parser_free($parser);
?>
Any help is greatly appreciated!
thanks
hanji
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

your problem is that that the "values" you want to pull, are in a <style> tag.. So you need to remember the previous tag too, otherwise you don't know where you are ;)

the following should get you started...

Code: Select all

ini_set('error_reporting', E_ALL);
ini_set('display_errors', TRUE);

$prev_tag = "";
$current_tag = "";
$current_attr = "";

function start_element($parser, $element_name, $element_attr)
{
    global $prev_tag, $current_tag, $current_attr;
    $prev_tag = $current_tag;
    $current_tag = $element_name;
    $current_attr = $element_attr;
}

function end_element($parser, $element_name)
{
    global $current_tag, $current_attr;
    $prev_tag = $current_tag;
    $current_tag = "";
    $current_attr = "";
}

function character_data($parser, $data)
{
    global $prev_tag, $current_tag, $current_attr;
    //echo "previous: $prev_tag ";
    //echo "current: $current_tag ";
    switch($prev_tag){
        case "AUTHOR":
        case "TITLE":
        case "YEAR":
        case "DATE":
        case "URL":
            echo $data;
            echo "<br>";
            break;
    }
    //echo "<br>";
}

$data = file_get_contents('test.xml');
$parser = xml_parser_create();
xml_set_element_handler($parser, "start_element", "end_element");
xml_set_character_data_handler($parser, "character_data");
xml_parse($parser, $data);
xml_parser_free($parser);
User avatar
hanji
Forum Commoner
Posts: 46
Joined: Fri Apr 29, 2005 3:23 pm

Post by hanji »

Fantastic!

Works awesome. I'm seeing the light now.
Thanks much!

hanji
Post Reply