[Solved]XML document saved with bad formatting

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

abeall
Forum Commoner
Posts: 41
Joined: Sun Feb 04, 2007 11:53 pm

[Solved]XML document saved with bad formatting

Post by abeall »

I'm using DomDocument to open an XML document and append some nodes and attributes, and save it. There are two problems:
1) The XML file has no whitespace between nodes. They are all smashed together in one line. Below is an example of human readable, and what PHP makes:

Code: Select all

<root>
    <node>Some text</node
    <node>More text</node>
</root>

Code: Select all

<root><node>Some text</node<node>More text</node></root>
2) Line breaks seem to be converted to &#xD; After a quick google, I still can't really tell what this is. Can it be prevented, and just left as a normal linebreak?

Any help in cleaning these two issues up would be appreciated.
Last edited by abeall on Sat Feb 10, 2007 9:36 pm, edited 1 time in total.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

Take a look at the DomDocument settings. I think there is one like "preserve_whitespace".
(#10850)
abeall
Forum Commoner
Posts: 41
Joined: Sun Feb 04, 2007 11:53 pm

Post by abeall »

There is, but that states it is on by default anyway, and that appears to only have to do with loading XML. It will preserve the whitespace on the loaded XML, the problem is that all the nodes that I append and save are not formatted, but rather tacked on to the end. So, let's say I have this:

Code: Select all

<root>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node>
</root>
And I load the document into PHP, and append two, nodes we'll call "phpnodes", and save. The resulting XML file looks like this:

Code: Select all

<root>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node>
    <node>Node text</node><phpnode>Node text</phpnode><phpnode>Node text</phpnode></root>
Notice there is no whitespace added with the phpnodes, and the root close tag has been moved up as well. These two issues, along with the strange &#xD; characters, are what I'm trying to resolve.
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

try

Code: Select all

<?php
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;

if (file_exists('test.xml')) {
	$dom->load('test.xml');
}
else {
	$dom->loadxml('<root></root>');
}

$n = $dom->createElement('node', date('H:i:s'));
$dom->documentElement->appendChild($n);
$dom->save('test.xml');
echo $dom->savexml();
note the formatOutput=true before load()
abeall
Forum Commoner
Posts: 41
Joined: Sun Feb 04, 2007 11:53 pm

Post by abeall »

Perfect. Thanks.

Any idea about the &#xD; characters replacing linebreaks?
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

No idea. Can you provide an example script?
abeall
Forum Commoner
Posts: 41
Joined: Sun Feb 04, 2007 11:53 pm

Post by abeall »

My code is attached. The POST data is submitted by Flash, which is posting data input by the user:

Code: Select all

<?php

if(count($_POST) == 4
	&& strlen($_POST['name']) <= 50
	&& strlen($_POST['favkind']) <= 50
	&& strlen($_POST['fav']) <= 50
	&& strlen($_POST['comment']) <= 100){
	
	foreach ($_POST AS $key => $val)
		$_POST[$key] = stripslashes($_POST[$key]);

	$dom = new DomDocument(); 
	$dom->preserveWhiteSpace = false;
	$dom->formatOutput = true; 
	$dom->load("commentary.xml");
	$node = $dom->createElement("comment"); 
	//attr: name
	appendChildAttribute($dom,$node,"name",$_POST['name']);
	//attr: favkind
	appendChildAttribute($dom,$node,"favkind",$_POST['favkind']);
	//attr: fav
	appendChildAttribute($dom,$node,"fav",$_POST['fav']);
	
	//append new node
	$node->appendChild($dom->createTextNode($_POST['comment'])); 
	$dom->documentElement->appendChild($node); 
	
	// Echo the output
	//header('Content-type: text/xml');
	//echo $dom->saveXML();
	
	// Save to file
	$dom->save("commentary.xml");

}else{
	echo "Invalid submission";
}

function appendChildAttribute($dom, $node, $attributeName, $attributeValue){
	$attr = $dom->createAttribute($attributeName); //create attribute
	$attr->appendChild($dom->createTextNode($attributeValue)); //assign attribute value
	$node->appendChild($attr); //add the attribute to node
}
Flash is not converting the linebreaks to &#xD;. Interestingly, though, when loaded back into Flash they come through as line breaks just fine, so it's not a problem in that sense.

BTW, why must preserveWhiteSpace be turned off in order for formatOutput to work?
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

I would assume that in certain contexts, adding whitespace actually modifies the resultant DOM (extra text nodes or something).
abeall
Forum Commoner
Posts: 41
Joined: Sun Feb 04, 2007 11:53 pm

Post by abeall »

Most likely that is the case, although I can't think of a situation where that would be true, other than for attributes(line break would obviously blow that apart). The problem is, I'm using the line-breaks in text node content, so it's obviously safe.

Still kinda stumped how to deal with it. My only guess would be to save the DomDocument, then open the file as a text file and str_replace the &#xD; with \n, but I don't think I could bring myself to do something that ugly. There has to be a way to control it....
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

abeall wrote:BTW, why must preserveWhiteSpace be turned off in order for formatOutput to work?
only to give dom all the freedom to remove unnecessary characters.

The line break is payload data? Then &#xD; is not such a mystery ;)
e.g.

Code: Select all

<p>
  yadda
  yadda
  <br />
  yadda
</p>
There's no explicit linebreak between the first two yaddas although there is a linebreak (\n) in the source code, but it's only treated as whitespace and can be used to format the source.
Same with xml, a "normal" linebreak in the source code is not a linebreak in the payload data. The charactercode for \n is 13, as hexadecimal 0xd.
abeall
Forum Commoner
Posts: 41
Joined: Sun Feb 04, 2007 11:53 pm

Post by abeall »

I don't understand what "payload data" is. What I know is that:
1) If I add line breaks to the XML file with a text editor, they aren't converted to &#xD;.
2) The data that is passed to PHP via POST with linebreaks(is that the "payload data"?) gets saved with linebreaks converted to &#xD;.
3) If I echo the POST data, though, linebreaks appear normally.

How do I ensure my XML document has normal linebreaks rather than &#xD; ?
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

Hm, let's try it the other way round.
How would you ensure

Code: Select all

yadda  yadda
is displayed correctly in a html document? Note the 2 spaces between the two yaddas.
abeall
Forum Commoner
Posts: 41
Joined: Sun Feb 04, 2007 11:53 pm

Post by abeall »

In that case, if I wanted to display it in HTML, I'd have to convert successive spaces to &nbsp;. With linebreaks, I'd have to convert \n to <br/>, like you mentioned. The problem is:
1) &#xD; does not display as a linebreak in HTML
2) I am not trying to display my data in HTML
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

abeall wrote:The problem is:
1) &#xD; does not display as a linebreak in HTML
2) I am not trying to display my data in HTML
No, that's not a problem.
1) &#xD; does not display as a linebreak in HTML
neither does \n, execpt within <pre>. And now guess what

Code: Select all

<html>
	<head><title>...</title></head>
	<body>
		<pre>yadda&#xD;yadda</pre>
	</body>
</html>
does.
2) I am not trying to display my data in HTML
I just thought an example you already know would make things easier.
abeall
Forum Commoner
Posts: 41
Joined: Sun Feb 04, 2007 11:53 pm

Post by abeall »

Sorry, I'm not disagreeing, I just don't understand, there's obviously something I'm missing. I guess I don't understand why \n is being converted to &#xD;, when, as you said, neither will appear in HTML without the <pre> tag. In that vein they seem the same to me, only \n actually display correctly in text editors. So if PHP is getting \n, why is it converting them to &#xD;, if it doesn't help anyway?
Post Reply