Page 1 of 2
Parsing data using xml_set_character_data_handler
Posted: Fri Oct 13, 2006 9:46 am
by impulse()
Hopefully somebody's in the position to help me with this. At the moment I have used a script on the internet to parse some data from an XML file and then echo out the data. Could somebody help me with this next piece of code to break the data up so it's not all stored it 1 variable ($data). I want to import this data into a MySQL DB which is why it needs to be broken up. If I insert at the moment, it tends to insert a lot of blank lines and insert data where I don't want it.
Code: Select all
$parser = xml_parser_create();
mysql_connect("x", "x", "x");
mysql_select_db("test");
function char($parser,$data)
{
echo $data;
mysql_query("INSERT INTO test (name) VALUES ('$data')"); # I know this is bad, it was just a test to see what happened
}
xml_set_character_data_handler($parser,"char");
$fp = fopen("data.xml","r");
while ($data = fread($fp,4096))
{
xml_parse($parser,$data,feof($fp)) or
die (sprintf("XML Error: %s at line %d",
xml_error_string(xml_get_error_code($parser)),
xml_get_current_line_number($parser)));
}
xml_parser_free($parser);
?>
Posted: Fri Oct 13, 2006 1:25 pm
by volka
Can you provide sample data and a description how you want it splitted and inserted?
Posted: Fri Oct 13, 2006 1:36 pm
by impulse()
This is the data:
Code: Select all
<friends>
<friend>
<name> Stephen </name>
<age> 21 </age>
<height> Tall </height>
<sex> Male </sex>
</friend>
<friend>
<name> Scott </name>
<age> 22 </age>
<height> Tallish </height>
<sex> Male </sex>
</friend>
<friend>
<name> Mark </name>
<age> 23 </age>
<height> Short </height>
<sex> Male </sex>
</friend>
<friend>
<name> Helena </name>
<age> 20 </age>
<height> Small </height>
<sex> Female </sex>
</friend>
</friends>
If possible, how would I split it into 4 x 2D arrays.
name[0] - The first persons name
age[0] - The first persons age
name[1] - The second persons name
name[2] - The third persons name
There will be 5 columns in my DB.
ID - This is on auto increment
Name
Age
Height
Sex
Posted: Fri Oct 13, 2006 1:45 pm
by impulse()
I started work on this code, but yet again I had no luck.
Code: Select all
if (!($xmlparser = xml_parser_create())) { die ("Connect create parser"); }
function start_tag($parser, $name, $attribs) {
echo "";
}
function end_tag($parser, $name) {
echo "";
}
function tag_contents($parser, $data) {
foreach ($data->friend as $foo) {
echo $foo->age;
}
#settype($data, "string");
#return str_split($data);
}
print_r($data);
xml_set_element_handler($xmlparser, "start_tag", "end_tag");
xml_set_character_data_handler($xmlparser, "tag_contents");
$filename = "data.xml";
if (!($fp = fopen($filename, "r"))) { die("Cannot open ". $filename); }
while ($data = fread($fp, 4096)) {
$data = eregi_replace(">"."[[]]+"."<","><", $data);
if (!xml_parse($xmlparser, $data, feof($fp))) {
$reason = xml_error_string(xml_get_error_code($xmlparser));
$reason .= xml_get_current_line_number($xmlparser);
die ($reason);
}
}
xml_parser_free($xmlparser);
Posted: Fri Oct 13, 2006 1:46 pm
by volka
Unless your xml document is _huge_ use a dom parser like simplexml
Code: Select all
<?php
$xml = <<< eot
<friends>
<friend>
<name> Stephen </name>
<age> 21 </age>
<height> Tall </height>
<sex> Male </sex>
</friend>
<friend>
<name> Scott </name>
<age> 22 </age>
<height> Tallish </height>
<sex> Male </sex>
</friend>
<friend>
<name> Mark </name>
<age> 23 </age>
<height> Short </height>
<sex> Male </sex>
</friend>
<friend>
<name> Helena </name>
<age> 20 </age>
<height> Small </height>
<sex> Female </sex>
</friend>
</friends>
eot;
$doc = simplexml_load_string($xml);
foreach($doc->friend as $f) {
echo ' name:' , $f->name,
' age:' , $f->age,
' height:' , $f->height,
' sex:' , $f->sex,
"<br />\n";
}
?>
Posted: Fri Oct 13, 2006 7:42 pm
by impulse()
Ive already created that code but it was denied. My task is to parse XML from an XML document into PHP and then insert it into a MySQL DB. I first of all used that code but I was told I had to use
.
I have been told I can work in the PHP department at my company if I can do what I've said above. I tried using simplexml_load_file but was told that it didn't have enough functionality so I had to use xml_set_element_handler to insert the XML into a MySQL DB.
Although I have revised for literally hours, I've only gotten as far as putting a whole tags data into a variable using:
Code: Select all
if (!($xmlparser = xml_parser_create())) { die ("Connect create parser"); }
function start_tag($parser, $name, $attribs) {
echo "";
}
function end_tag($parser, $name) {
echo "";
}
function tag_contents($parser, $data) {
echo $data;
}
#settype($data, "string");
#return str_split($data);
}
print_r($data);
xml_set_element_handler($xmlparser, "start_tag", "end_tag");
xml_set_character_data_handler($xmlparser, "tag_contents");
$filename = "data.xml";
if (!($fp = fopen($filename, "r"))) { die("Cannot open ". $filename); }
while ($data = fread($fp, 4096)) {
$data = eregi_replace(">"."[[]]+"."<","><", $data);
if (!xml_parse($xmlparser, $data, feof($fp))) {
$reason = xml_error_string(xml_get_error_code($xmlparser));
$reason .= xml_get_current_line_number($xmlparser);
die ($reason);
}
}
xml_parser_free($xmlparser);
?>
I think my next step, if what listed above isn't possible, is to split the array, somehow, and insert each element individually.
Any help on doing my task would be greatly appreciated.
Regards,
Posted: Sat Oct 14, 2006 6:20 am
by volka
impulse() wrote:I tried using simplexml_load_file but was told that it didn't have enough functionality
nonsense, the example proves it can do what you're asking for. And see how easy it was done. If simplexml doesn't provide needed functionality you can always use
dom_import_simplexml to switch the interface (the internal representation is the same).
One "disadvantage" of a dom parser is the whole document is represented by the model. If the document is large and you only need a small portion of the document this may be a waste of memory and cpu time. If this is an issue you use a sax parser that provides you the data as it is read - on-the-fly. This way the whole document can be parsed but only the portions needed are held in memory.
There are hybrid implementations attepting to combine the best of dom and sax. But I haven't seen one for php yet.
Is the xml data size an issue?
If you must implement it using xml_set_..._handler you need to store the current status for each start/end of an element.
If you enter a new <friend> element you create a new container for the data of this friend element (as simple example an array)
start_tag, end_tag and tag_contents must share access to the flags and the curent container.
If you are within a <friend> element and enter a <name> element you store subsequent character data in the current <friend> container until this <name> element ends. Same with age, height and sex.
When the <friend> elements ends you check the current data container and do whatever you want to do with a <friend> element (in your case write the data to mysql).
Now you can discard the current container and wait for a new <friend> element.
Posted: Sat Oct 14, 2006 6:44 am
by impulse()
Would it be a good idea to leave 2 of the 3 functions (start, container, end) empty and put all the code in 1 of the functions?
When you say "create a container", are you suggesting doing something like:
Code: Select all
$container[$i] = $data;
return $container[$i];
$i++;
And then outside of the function doing:
Code: Select all
for ($i = 0; $i <= count($container); $i++) {
mysql_query("INSERT INTO (column) VALUES ('$container[$i]')");
}
Posted: Sat Oct 14, 2006 6:51 am
by volka
impulse() wrote:Would it be a good idea to leave 2 of the 3 functions (start, container, end) empty and put all the code in 1 of the functions?
no, how would you know where in the document you are?
impulse() wrote:
When you say "create a container", are you suggesting doing something like:
Code: Select all
$container[$i] = $data;
return $container[$i];
$i++;
Probably not. But can't tell for sure from the small code snippet.
Posted: Sat Oct 14, 2006 7:17 am
by impulse()
This is the code I have at the moment:
Code: Select all
mysql_connect("x", "x", "x");
mysql_select_db("test");
if (!($xmlparser = xml_parser_create())) { die ("Connect create parser"); }
global $i;
$i = 0;
function start_tag($parser, $name, $attribs) {
}
function end_tag($parser, $name) {
}
function tag_contents($parser, $data) {
switch ($i) {
case 0:
mysql_query("INSERT INTO test (name) VALUES ('$data')");
$i++;
break;
case 1:
mysql_query("INSERT INTO test (age) VALUES ('$data')");
$i++;
break;
case 2:
mysql_query("INSERT INTO test (sex) VALUES ('$data')");
$i++;
break;
}
}
print_r($data);
xml_set_element_handler($xmlparser, "start_tag", "end_tag");
xml_set_character_data_handler($xmlparser, "tag_contents");
$filename = "data.xml";
if (!($fp = fopen($filename, "r"))) { die("Cannot open ". $filename); }
while ($data = fread($fp, 4096)) {
$data = eregi_replace(">"."[[]]+"."<","><", $data);
if (!xml_parse($xmlparser, $data, feof($fp))) {
$reason = xml_error_string(xml_get_error_code($xmlparser));
$reason .= xml_get_current_line_number($xmlparser);
die ($reason);
}
}
xml_parser_free($xmlparser);
?>
This code does work but it insert the data into the same column in the database.
(Name column)
Row 1: Stephen
Row 2: 21
Row 3: Male
Row 4: Scott
Row 5: 20
The age & sex colums are empty.
etc etc.
Posted: Sat Oct 14, 2006 7:29 am
by volka
This approach won't work.
Posted: Sat Oct 14, 2006 7:31 am
by impulse()
Can you help me out and point me in the right direction where I'm going wrong?
Posted: Sat Oct 14, 2006 7:32 am
by volka
Already did.
volka wrote:impulse() wrote:I tried using simplexml_load_file but was told that it didn't have enough functionality
nonsense, the example proves it can do what you're asking for. And see how easy it was done. If simplexml doesn't provide needed functionality you can always use
dom_import_simplexml to switch the interface (the internal representation is the same).
One "disadvantage" of a dom parser is the whole document is represented by the model. If the document is large and you only need a small portion of the document this may be a waste of memory and cpu time. If this is an issue you use a sax parser that provides you the data as it is read - on-the-fly. This way the whole document can be parsed but only the portions needed are held in memory.
There are hybrid implementations attepting to combine the best of dom and sax. But I haven't seen one for php yet.
Is the xml data size an issue?
If you must implement it using xml_set_..._handler you need to store the current status for each start/end of an element.
If you enter a new <friend> element you create a new container for the data of this friend element (as simple example an array)
start_tag, end_tag and tag_contents must share access to the flags and the curent container.
If you are within a <friend> element and enter a <name> element you store subsequent character data in the current <friend> container until this <name> element ends. Same with age, height and sex.
When the <friend> elements ends you check the current data container and do whatever you want to do with a <friend> element (in your case write the data to mysql).
Now you can discard the current container and wait for a new <friend> element.
Posted: Sat Oct 14, 2006 7:37 am
by impulse()
Would you mind posting some code that you've used previously using xml_set_character_data_handler that inserts into a MySQL DB?
Posted: Sat Oct 14, 2006 7:39 am
by volka
I never did such a thing.