Page 1 of 2

[Solved]XML document saved with bad formatting

Posted: Wed Feb 07, 2007 11:52 pm
by abeall
I'm using DomDocument to open an XML document and append some nodes and attributes, and save it. There are two problems:
1) The XML file has no whitespace between nodes. They are all smashed together in one line. Below is an example of human readable, and what PHP makes:

Code: Select all

<root>
    <node>Some text</node
    <node>More text</node>
</root>

Code: Select all

<root><node>Some text</node<node>More text</node></root>
2) Line breaks seem to be converted to &#xD; After a quick google, I still can't really tell what this is. Can it be prevented, and just left as a normal linebreak?

Any help in cleaning these two issues up would be appreciated.

Posted: Thu Feb 08, 2007 12:39 am
by Christopher
Take a look at the DomDocument settings. I think there is one like "preserve_whitespace".

Posted: Thu Feb 08, 2007 8:28 am
by abeall
There is, but that states it is on by default anyway, and that appears to only have to do with loading XML. It will preserve the whitespace on the loaded XML, the problem is that all the nodes that I append and save are not formatted, but rather tacked on to the end. So, let's say I have this:

Code: Select all

<root>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node>
</root>
And I load the document into PHP, and append two, nodes we'll call "phpnodes", and save. The resulting XML file looks like this:

Code: Select all

<root>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node><phpnode>Node text</phpnode><phpnode>Node text</phpnode></root>
Notice there is no whitespace added with the phpnodes, and the root close tag has been moved up as well. These two issues, along with the strange &#xD; characters, are what I'm trying to resolve.

Posted: Thu Feb 08, 2007 8:44 am
by volka
try

Code: Select all

<?php
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;

if (file_exists('test.xml')) {
	$dom->load('test.xml');
}
else {
	$dom->loadxml('<root></root>');
}

$n = $dom->createElement('node', date('H:i:s'));
$dom->documentElement->appendChild($n);
$dom->save('test.xml');
echo $dom->savexml();
note the formatOutput=true before load()

Posted: Thu Feb 08, 2007 9:14 am
by abeall
Perfect. Thanks.

Any idea about the &#xD; characters replacing linebreaks?

Posted: Thu Feb 08, 2007 11:05 am
by volka
No idea. Can you provide an example script?

Posted: Thu Feb 08, 2007 7:18 pm
by abeall
My code is attached. The POST data is submitted by Flash, which is posting data input by the user:

Code: Select all

<?php

if(count($_POST) == 4
	&& strlen($_POST['name']) <= 50
	&& strlen($_POST['favkind']) <= 50
	&& strlen($_POST['fav']) <= 50
	&& strlen($_POST['comment']) <= 100){
	
	foreach ($_POST AS $key => $val)
		$_POST[$key] = stripslashes($_POST[$key]);

	$dom = new DomDocument(); 
	$dom->preserveWhiteSpace = false;
	$dom->formatOutput = true; 
	$dom->load("commentary.xml");
	$node = $dom->createElement("comment"); 
	//attr: name
	appendChildAttribute($dom,$node,"name",$_POST['name']);
	//attr: favkind
	appendChildAttribute($dom,$node,"favkind",$_POST['favkind']);
	//attr: fav
	appendChildAttribute($dom,$node,"fav",$_POST['fav']);
	
	//append new node
	$node->appendChild($dom->createTextNode($_POST['comment'])); 
	$dom->documentElement->appendChild($node); 
	
	// Echo the output
	//header('Content-type: text/xml');
	//echo $dom->saveXML();
	
	// Save to file
	$dom->save("commentary.xml");

}else{
	echo "Invalid submission";
}

function appendChildAttribute($dom, $node, $attributeName, $attributeValue){
	$attr = $dom->createAttribute($attributeName); //create attribute
	$attr->appendChild($dom->createTextNode($attributeValue)); //assign attribute value
	$node->appendChild($attr); //add the attribute to node
}
Flash is not converting the linebreaks to &#xD;. Interestingly, though, when loaded back into Flash they come through as line breaks just fine, so it's not a problem in that sense.

BTW, why must preserveWhiteSpace be turned off in order for formatOutput to work?

Posted: Thu Feb 08, 2007 9:00 pm
by Ambush Commander
I would assume that in certain contexts, adding whitespace actually modifies the resultant DOM (extra text nodes or something).

Posted: Thu Feb 08, 2007 9:07 pm
by abeall
Most likely that is the case, although I can't think of a situation where that would be true, other than for attributes(line break would obviously blow that apart). The problem is, I'm using the line-breaks in text node content, so it's obviously safe.

Still kinda stumped how to deal with it. My only guess would be to save the DomDocument, then open the file as a text file and str_replace the &#xD; with \n, but I don't think I could bring myself to do something that ugly. There has to be a way to control it....

Posted: Fri Feb 09, 2007 1:28 am
by volka
abeall wrote:BTW, why must preserveWhiteSpace be turned off in order for formatOutput to work?
only to give dom all the freedom to remove unnecessary characters.

The line break is payload data? Then &#xD; is not such a mystery ;)
e.g.

Code: Select all

<p>
  yadda
  yadda
  <br />
  yadda
</p>
There's no explicit linebreak between the first two yaddas although there is a linebreak (\n) in the source code, but it's only treated as whitespace and can be used to format the source.
Same with xml, a "normal" linebreak in the source code is not a linebreak in the payload data. The charactercode for \n is 13, as hexadecimal 0xd.

Posted: Fri Feb 09, 2007 1:38 am
by abeall
I don't understand what "payload data" is. What I know is that:
1) If I add line breaks to the XML file with a text editor, they aren't converted to &#xD;.
2) The data that is passed to PHP via POST with linebreaks(is that the "payload data"?) gets saved with linebreaks converted to &#xD;.
3) If I echo the POST data, though, linebreaks appear normally.

How do I ensure my XML document has normal linebreaks rather than &#xD; ?

Posted: Fri Feb 09, 2007 2:01 am
by volka
Hm, let's try it the other way round.
How would you ensure

Code: Select all

yadda  yadda
is displayed correctly in a html document? Note the 2 spaces between the two yaddas.

Posted: Fri Feb 09, 2007 9:03 am
by abeall
In that case, if I wanted to display it in HTML, I'd have to convert successive spaces to &nbsp;. With linebreaks, I'd have to convert \n to <br/>, like you mentioned. The problem is:
1) &#xD; does not display as a linebreak in HTML
2) I am not trying to display my data in HTML

Posted: Fri Feb 09, 2007 9:42 am
by volka
abeall wrote:The problem is:
1) &#xD; does not display as a linebreak in HTML
2) I am not trying to display my data in HTML
No, that's not a problem.
1) &#xD; does not display as a linebreak in HTML
neither does \n, execpt within <pre>. And now guess what

Code: Select all

<html>
	<head><title>...</title></head>
	<body>
		<pre>yadda&#xD;yadda</pre>
	</body>
</html>
does.
2) I am not trying to display my data in HTML
I just thought an example you already know would make things easier.

Posted: Fri Feb 09, 2007 10:17 am
by abeall
Sorry, I'm not disagreeing, I just don't understand, there's obviously something I'm missing. I guess I don't understand why \n is being converted to &#xD;, when, as you said, neither will appear in HTML without the <pre> tag. In that vein they seem the same to me, only \n actually display correctly in text editors. So if PHP is getting \n, why is it converting them to &#xD;, if it doesn't help anyway?