Parsing XML file, not getting all data

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Parsing XML file, not getting all data

Post by GeXus »

I have an XML file that I'm looping through with simplexml, there are a total of 46,000 nodes... when looping through I'm each node to a database... The problem I'm having is that after the script is done running, I only have 22,000 items in my DB... meaning it must have skipped some for whatever reason. I have set_time_limit to zero, and I tried adding sleep(1) to the bottom of the loop, which grabbed me an extra 15 or so...

Any idea why some would be missing? I'm escaping all text fields also..
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

It would appear you have some call for debugging in order. Maybe writing some data to a file or outputting data to the browser...
EricS
Forum Contributor
Posts: 183
Joined: Thu Jul 11, 2002 12:02 am
Location: Atlanta, Ga

Post by EricS »

Write a little debug script that counts and displays the number of nodes SimpleXML thinks there are.

Then you'll know if it's the way you are using SimpleXML or something else.
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

I added an error log, but it runs through and doesn't write an errors... I tested it with a known error and it did work... would this mean there are no errors? Maybe it's just skipping records?
EricS
Forum Contributor
Posts: 183
Joined: Thu Jul 11, 2002 12:02 am
Location: Atlanta, Ga

Post by EricS »

Yep. That means it's a logic error and not parse error.
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

I've had this same problem before with large XML files... Not really sure what the deal is... here is what I have...

Code: Select all

<?php
ini_set("error_log" , "error.log");
set_time_limit(0);
ini_set("memory_limit","256M");


if (file_exists('test.xml')) {
    
$xml = simplexml_load_file('test.xml');
foreach($items as $item){

	$items = $xml->children();

	$productId = 0;

	foreach($items as $item){

	$ProductId = $xml->product[$productId]->ProductId;
	$name = $xml->product[$productId]->name;
	$description = $xml->product[$productId]->description;
	$imageUrl = $xml->product[$productId]->imageUrl;
	$productUrl = $xml->product[$productId]->productUrl;
	$categoryId = $xml->product[$productId]->Categories->Category[0]->id;
	$categoryName = $xml->product[$productId]->TDCategories->Category[0]->name;
	$actor = $xml->product[$productId]->fields->field[0]->value;
	$directors = $xml->product[$productId]->fields->field[1]->value;

	$name = mysql_real_escape_string($name);
	$description = mysql_real_escape_string($description);
	$categoryName = mysql_real_escape_string($categoryName);
	$actor = mysql_real_escape_string($actor);
	$directors = mysql_real_escape_string($directors);

	mysql_query("replace into products (ProductId, name, description, imageUrl, productUrl, categoryId, 	categoryName, actors, directors) VALUES 	('$ProductId','$name','$description','$imageUrl','$productUrl','$categoryId','$categoryName','$actor','$directors')")or die(mysql_error());

	$productId++;
	sleep(1);
	}


}

} else {
    exit('Failed to open test.xml.');
}
?>

Any ideas? I tried upping the sleep to 2, but that made no difference, it's just not adding any more... and no errors..
EricS
Forum Contributor
Posts: 183
Joined: Thu Jul 11, 2002 12:02 am
Location: Atlanta, Ga

Post by EricS »

Look for my comments in the code.

Code: Select all

<?php 
ini_set("error_log" , "error.log"); 
set_time_limit(0); 
ini_set("memory_limit","256M"); 


if (file_exists('test.xml')) { 
    
$xml = simplexml_load_file('test.xml');

// What is this? How is $items getting set with it's initial value?
// If this is all the code then this foreach loop should fail immediately.
foreach($items as $item){ 

        $items = $xml->children(); 

        $productId = 0; 

        foreach($items as $item){ 

        $ProductId = $xml->product[$productId]->ProductId; 
        $name = $xml->product[$productId]->name; 
        $description = $xml->product[$productId]->description; 
        $imageUrl = $xml->product[$productId]->imageUrl; 
        $productUrl = $xml->product[$productId]->productUrl; 
        $categoryId = $xml->product[$productId]->Categories->Category[0]->id; 
        $categoryName = $xml->product[$productId]->TDCategories->Category[0]->name; 
        $actor = $xml->product[$productId]->fields->field[0]->value; 
        $directors = $xml->product[$productId]->fields->field[1]->value; 

        $name = mysql_real_escape_string($name); 
        $description = mysql_real_escape_string($description); 
        $categoryName = mysql_real_escape_string($categoryName); 
        $actor = mysql_real_escape_string($actor); 
        $directors = mysql_real_escape_string($directors); 

        mysql_query("replace into products (ProductId, name, description, imageUrl, productUrl, categoryId,     categoryName, actors, directors) VALUES     ('$ProductId','$name','$description','$imageUrl','$productUrl','$categoryId','$categoryName','$actor','$directors')")or die(mysql_error()); 

        $productId++; 
        sleep(1); 
        } 


} 

} else { 
    exit('Failed to open test.xml.'); 
} 
?>
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

EricS wrote:Look for my comments in the code.

Code: Select all

<?php 
ini_set("error_log" , "error.log"); 
set_time_limit(0); 
ini_set("memory_limit","256M"); 


if (file_exists('test.xml')) { 
    
$xml = simplexml_load_file('test.xml');


$items = $xml->children();
$productId = 0;

// What is this? How is $items getting set with it's initial value?
// If this is all the code then this foreach loop should fail immediately.
foreach($items as $item){ 

        $items = $xml->children(); 

        $productId = 0; 

        foreach($items as $item){ 

        $ProductId = $xml->product[$productId]->ProductId; 
        $name = $xml->product[$productId]->name; 
        $description = $xml->product[$productId]->description; 
        $imageUrl = $xml->product[$productId]->imageUrl; 
        $productUrl = $xml->product[$productId]->productUrl; 
        $categoryId = $xml->product[$productId]->Categories->Category[0]->id; 
        $categoryName = $xml->product[$productId]->TDCategories->Category[0]->name; 
        $actor = $xml->product[$productId]->fields->field[0]->value; 
        $directors = $xml->product[$productId]->fields->field[1]->value; 

        $name = mysql_real_escape_string($name); 
        $description = mysql_real_escape_string($description); 
        $categoryName = mysql_real_escape_string($categoryName); 
        $actor = mysql_real_escape_string($actor); 
        $directors = mysql_real_escape_string($directors); 

        mysql_query("replace into products (ProductId, name, description, imageUrl, productUrl, categoryId,     categoryName, actors, directors) VALUES     ('$ProductId','$name','$description','$imageUrl','$productUrl','$categoryId','$categoryName','$actor','$directors')")or die(mysql_error()); 

        $productId++; 
        sleep(1); 
        } 


} 

} else { 
    exit('Failed to open test.xml.'); 
} 
?>

Sorry, I updated it... was changing a few values before posting, forgot to add it.
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

I created a log table to write the array id to it.... It stopped at 597, meaning the loop only went to 597 instead of going all the way...
EricS
Forum Contributor
Posts: 183
Joined: Thu Jul 11, 2002 12:02 am
Location: Atlanta, Ga

Post by EricS »

Did you examine the XML in the document you loading around the node id your stopping on? Sounds like SimpleXML is reading in something it doesn't like and considers that the end of the document.
EricS
Forum Contributor
Posts: 183
Joined: Thu Jul 11, 2002 12:02 am
Location: Atlanta, Ga

Post by EricS »

Look for my comments.

Code: Select all

<?php 
ini_set("error_log" , "error.log"); 
set_time_limit(0); 
ini_set("memory_limit","256M"); 


if (file_exists('test.xml')) { 
    
$xml = simplexml_load_file('test.xml'); 


$items = $xml->children(); 
$productId = 0; 

// Okay you fixed this. 
foreach($items as $item){ 
				// But now you have recursion bug here! 
        $items = $xml->children(); 

        $productId = 0; 

        foreach($items as $item){ 

        $ProductId = $xml->product[$productId]->ProductId; 
        $name = $xml->product[$productId]->name; 
        $description = $xml->product[$productId]->description; 
        $imageUrl = $xml->product[$productId]->imageUrl; 
        $productUrl = $xml->product[$productId]->productUrl; 
        $categoryId = $xml->product[$productId]->Categories->Category[0]->id; 
        $categoryName = $xml->product[$productId]->TDCategories->Category[0]->name; 
        $actor = $xml->product[$productId]->fields->field[0]->value; 
        $directors = $xml->product[$productId]->fields->field[1]->value; 

        $name = mysql_real_escape_string($name); 
        $description = mysql_real_escape_string($description); 
        $categoryName = mysql_real_escape_string($categoryName); 
        $actor = mysql_real_escape_string($actor); 
        $directors = mysql_real_escape_string($directors); 

        mysql_query("replace into products (ProductId, name, description, imageUrl, productUrl, categoryId,     categoryName, actors, directors) VALUES     ('$ProductId','$name','$description','$imageUrl','$productUrl','$categoryId','$categoryName','$actor','$directors')")or die(mysql_error()); 

        $productId++; 
        sleep(1); 
        } 


} 

} else { 
    exit('Failed to open test.xml.'); 
} 
?>
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

EricS wrote:Look for my comments.

Code: Select all

<?php 
ini_set("error_log" , "error.log"); 
set_time_limit(0); 
ini_set("memory_limit","256M"); 


if (file_exists('test.xml')) { 
    
$xml = simplexml_load_file('test.xml'); 
$items = $xml->children(); 
$productId = 0; 

foreach($items as $item){  

        $ProductId = $xml->product[$productId]->ProductId; 
        $name = $xml->product[$productId]->name; 
        $description = $xml->product[$productId]->description; 
        $imageUrl = $xml->product[$productId]->imageUrl; 
        $productUrl = $xml->product[$productId]->productUrl; 
        $categoryId = $xml->product[$productId]->Categories->Category[0]->id; 
        $categoryName = $xml->product[$productId]->TDCategories->Category[0]->name; 
        $actor = $xml->product[$productId]->fields->field[0]->value; 
        $directors = $xml->product[$productId]->fields->field[1]->value; 

        $name = mysql_real_escape_string($name); 
        $description = mysql_real_escape_string($description); 
        $categoryName = mysql_real_escape_string($categoryName); 
        $actor = mysql_real_escape_string($actor); 
        $directors = mysql_real_escape_string($directors); 

        mysql_query("replace into products (ProductId, name, description, imageUrl, productUrl, categoryId,     categoryName, actors, directors) VALUES     ('$ProductId','$name','$description','$imageUrl','$productUrl','$categoryId','$categoryName','$actor','$directors')")or die(mysql_error()); 

        $productId++; 
        sleep(1); 
}

} else { 
    exit('Failed to open test.xml.'); 
} 
?>
lol.... WHOOPSIE.. that shouldn't be in there, this is what I have.... I fixed it in your quote
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

EricS wrote:Did you examine the XML in the document you loading around the node id your stopping on? Sounds like SimpleXML is reading in something it doesn't like and considers that the end of the document.
Yes, I checked it out.. everything looks good.. and whats weired is that the log stopped at 597, yet I've been able to get 12,391 items in the product table. The xml file is 56mb also, I'm wondering if it's just too big..
Post Reply