Page 1 of 1

PHP isn't recognising blank data or NULL data.

Posted: Thu Oct 19, 2006 5:46 am
by impulse()
I have this function:

Code: Select all

function contents($parser, $data){
    static $i = 1;
    global $contents;

    if (!empty($data)) {
      $contents[$i] = $data;
      $i++;
      }
    }
But if I do

Code: Select all

print_r($contents);
at the end of the script I get:
Array
(
[1] =>

[2] =>

[3] => stephen
[4] =>

[5] => 21
[6] =>

[7] =>


[8] =>

[9] => king kong
[10] =>

[11] => 1000
[12] =>

[13] =>

)
That data is obtained from an XML file parsed using xml_set_element_handler/xml_set_character_data_handler, and the blank entried are opening & closing tags.
How would it be possible for PHP to recognise if they are XML data or XML tags?

Regards,

Posted: Thu Oct 19, 2006 7:00 am
by feyd
They aren't actually blanks, the tag is there.

Code: Select all

echo '<pre>' . htmlentities(var_export($contents, true), ENT_QUOTES) . '</pre>';
for example should show the tags.

Using a function like strip_tags() on each element followed by a removal of empty elements should probably solve your problem.

array_map() * strip_tags() + array_filter() * strlen() = success.

Posted: Tue Oct 24, 2006 7:12 am
by impulse()
I've only just got around to trying your thoughts but I'm having trouble using strip_tags.
I've changed my code so it looks like:

Code: Select all

function contents($parser, $data){
    static $i = 1;
    global $contents;

      $contents[$i] = strip_tags($data);
      $i++;
      }
    }
But this still shows the tags if I print_r the array.

I've also tried running a loop outside of contents function like so:

Code: Select all

for ($i = 1; $i < count($contents); $i++) {
  $stripped[$i] = strip_tags($data[$i]);
  }
But this still outputs the same results, with the tags.


Can you help me out further please?

Posted: Tue Oct 24, 2006 7:45 am
by volka
You are not guaranteed that the handler set by xml_set_character_data_handler is called only once for the whole "contents" of an element.
edit: and the whitespaces "between" the elements are cdata as well.

But I really can't and won't help you until you stop using that silly counter and instead evaluate the tag names passed to your start_element handler and end_element handler as mentioned before (day? weeks?). If you choose to continue to ignore that ...godspeed.

Posted: Tue Oct 24, 2006 11:57 am
by impulse()
Even if I do evaluate the start/end tag names, I can still see myself using an array with a counter to store the contents between the start/end tags.

I really did read your posts several times over and did try to follow but my attempts were only ever attempts so I had to figure another way to do it. I've never ignored anything you've advised, I've just not been able to create such code.

Posted: Tue Oct 24, 2006 1:11 pm
by volka
Then why is there never anything even remotly connected to the tagname in your code or in your questions?

Code: Select all

<?php
function start_tag($parser, $name, $attribs) {
	$name = strtolower($name);
	if ( 'mytag'===$name ) {
		echo "start: mytag<br />\n";
	}
	else {
		echo "  ignoring tag<br />\n";
	}
}

function end_tag($parser, $name) {
	$name = strtolower($name);
	if ( 'mytag'===$name ) {
		echo "end: mytag<br />\n";
	}
}

function tag_contents($parser, $data) {
}

$parser = xml_parser_create();
xml_set_element_handler($parser, 'start_tag', 'end_tag');
xml_set_character_data_handler($parser, 'tag_contents');


$data = '<root>
	<mytag>abc</mytag>
	<mytag>def</mytag>
	<something>abc</something>
	<else>def</else>
	<mytag>abc</mytag>
</root>';
xml_parse($parser, $data, true);
?>
Even if I do evaluate the start/end tag names, I can still see myself using an array with a counter to store the contents between the start/end tags.
Maybe, but there's no need for it at all ;)
Q: For which elements/tags must your script care? Under which conditions?
A: Only for name,age,height,sex. And only while beeing "within" a friend element.

Q: For the given (simple) xml document structure what information must be shared therefore between the three functions?
A: "Am I within a friend element?", "What's the current element?" -> "What's the current element's contents (so far)?"

Q: At which point must all data for one recordset have been provided?
A: When the parser reaches the end tag of a friend element (</friend>) no data can be added to this element anymore. Either all data has been gathered or the record can't be fixed.

Q: Where do you process the data of one friend element? / When can you insert a mysql recordset?
A: In my end_tag handler each time the end of a friend element is signaled. I check the current data shared between the function, build the mysql query and then re-initialize the shared data.

Posted: Tue Oct 24, 2006 1:21 pm
by impulse()
That really has helped me. I think I've got cracked it, well, at least I'm 99% sure I'm on the right track now. I'm franticly writing some code at the moment in shire excitement of thinking it's going to work :)

And guess what, no array and no counters :)

Posted: Tue Oct 24, 2006 1:27 pm
by volka
uh, no array? Can work -of course- but I would be too lazy for that We will see. ;)

Posted: Tue Oct 24, 2006 1:36 pm
by impulse()
Here she is:

Code: Select all

$file = "data2.xml";

mysql_connect("x", "x", "x");
mysql_select_db("test");


  function startTag($parser, $data) {
    $dataStart = strtolower($data);

    global $current, $name, $age, $sex;

    switch($dataStart) {
      case "name":
        $name = array();
        $current = "name";
        break;
      case "age":
        $age = array();
        $current = "age";
        break;
      case "sex":
        $sex = array();
        $current = "sex";
        break;
      default:
        break;
    }
  }

  function contents($parser, $data) {
    global $current, $name, $age, $sex;

    switch($current) {
      case "name":
        $name = $data;
        break;
      case "age":
        $age = $data;
        break;
      case "sex":
        $sex = $data;
        break;
      default:
        break;
    }
  }
function endTag($parser, $data) {
    $dataEnd = strtoupper($data);
    global $name, $age, $sex;


    switch($dataEnd) {
      case "PERSON":
        mysql_query("INSERT INTO test (name, age, sex) VALUES
                                      ('$name', '$age', '$sex')") or die ("Error ". mysql_error());
        break;
      default:
        break;
    }

  }


  $xml_parser = xml_parser_create();

  xml_set_element_handler($xml_parser, "startTag", "endTag");

  xml_set_character_data_handler($xml_parser, "contents");

  $fp = fopen($file, "r");

  $data = fread($fp, 80000);

  if(!(xml_parse($xml_parser, $data, feof($fp)))) {
    die("Error on line ". xml_get_current_line_number($xml_parser));
  }

  xml_parser_free($xml_parser);

  fclose($fp);

?>

Posted: Tue Oct 24, 2006 2:34 pm
by volka
Much better. Keep in mind contents($parser, $data) is not guaranteed to deliver the whole element data, it might be called more than once for the same element. Simple example: entitites. The parser will pause before looking up entities

Code: Select all

<?php
function tag_contents($parser, $data) {
	echo '[', $data, ']';
}

$xml = '<root><element>ab<cd</element></root>';

$parser = xml_parser_create();
xml_set_character_data_handler($parser, 'tag_contents'); 
xml_parse($parser, $xml, false);
?>
The output is
[ab][<][cd]
tag_contents has been called three time for a single element. There are other occasions where this might happen. Therefore you have to append $data to the current value.

An array might be easier to handle (initialize,check,re-initialize and so on)

Code: Select all

function startTag($parser, $data) {
	global $current, $person;
	$dataStart = strtolower($data);

	$dataStart = strtolower($data);
	switch($dataStart) {
		case 'person':
			$person = array();
			break;
		case 'name':
		case 'age':
		case 'sex':
			$current = $dataStart;
			$person[$current] = '';
			break;
		default:
			$current = null;
			break;
	}
}