Page 1 of 2

[Solved] XML: How to convert plan text into XML using php?

Posted: Fri Jan 12, 2007 2:40 am
by phpwalker
I've no idea how to accomplish this task, googled around but no luck.

Can you guys give me some ideas or some clues how to start with it?

Or maybe a tutorials for me will be good. I really couldn't find one.

Let say I've this text:

To: Adrian
From: Gorillaz
Date:12 January 2007
Message: Hi, adrian, I need your help.
How do I convert it into XML?

Posted: Fri Jan 12, 2007 3:43 am
by Chris Corbyn
What structure of XML would you like that to generate?

Code: Select all

<record>
    <to>Adrian</to>
    <from>Gorillaz</from>
    <date>12 January 2007</date>
    <message>Hi, adrian, I need your help.</message> 
</record>
You can use SimpleXML (PHP extension - you'll find it on php.net) to turn an array into XML, you if you explode each line of what you showed at ":" you should get off to a good start ;)

Posted: Fri Jan 12, 2007 4:31 am
by Kieran Huggins
SimpleXML is a good extension. Here's another way using preg_replace_callback():

Code: Select all

$str = <<<EOF

To: Adrian
From: Gorillaz
Date:12 January 2007
Message: Hi, adrian, I need your help. 

EOF;

echo preg_replace_callback('#(.*?):(.*?)\n#','callback',$str);
function callback($m){
	return '<'.trim($m['1']).'>'.trim($m['2']).'</'.trim($m['1']).'>'."\n";
}
It's my favourite function today because it earned me some spam!

Posted: Fri Jan 12, 2007 5:59 am
by Ollie Saunders
It's my favourite function today because it earned me some spam!
That it did! The meaty kind not the unwanted email kind.
Although to be honest I would just explode by '\n' and then by ':' limited to 2 explosions.

Posted: Fri Jan 12, 2007 7:03 am
by Kieran Huggins
ole wrote:
It's my favourite function today because it earned me some spam!
That it did! The meaty kind not the unwanted email kind.
Although to be honest I would just explode by '\n' and then by ':' limited to 2 explosions.
ole's is faster and easier than mine... is it spam worthy? I think so. Send this man a can of spam right away.

Posted: Fri Jan 12, 2007 9:55 am
by Ollie Saunders
Send this man a can of spam right away.
OOohh yes please! I would have gone nicely with my pasta I just had.

Image

Posted: Fri Jan 12, 2007 4:09 pm
by Chris Corbyn
That's a nicely labelled diagram... I so totally never realised there was more to a tin of spam than just pink processed meat.

Posted: Fri Jan 12, 2007 5:07 pm
by Kieran Huggins
As you can tell from my sexy spam-coloured velour uniform, I'm a big fan!

Posted: Thu Jan 18, 2007 11:24 pm
by phpwalker
oh my god, why so many people spamming here?...

reply@d11wtq: I just want to convert a simple text data into XML, is there any tutorial that will teach and explain every step of the code?

Okay, let say I have the data in following text file:
data1: abc
data2: 456
data3: 789
I have to create a conversion page which can browse and local hard disk and
accept this text.txt as an input. Click on the Conversion button and will
resulted in the output in XML format.

And the output will be a properly formatted XML file (text.xml)

After that, I need to create another Conversion Page to convert the XML back to text file.

I've no clue, what I search always the RDF or RSS XML, I want the pure XML. I need some guidances for this.

Posted: Fri Jan 19, 2007 9:19 am
by Kieran Huggins
Two options:

1. use preg_replace() and/or explode() and/or str_replace() to textually build an xml document. Then import it in one of the many php xml extensions and save it as a file (to ensure good form.)

2. use the DOM or DOM-XML functions to build a DOM of the data, then save that as XML

I go with #2

Posted: Fri Jan 19, 2007 11:07 pm
by phpwalker
Thanks Kieran Huggins!!!! Because I've been thinking for a while how to convert plain text into XML by using php. My sudden spark is a piece of php code to scan through the plain text, then search for the semicolon, before and after.

Code: Select all

1: ABC
2: DEF
3: GHI


Before semicolon, add "<" tag and ">" tag to the "1". After semicolon, it is the value, before scanning to the new line, add "<" tag and ">" tag again with the "1" inside it.

However, I don't really know how to scan the things and the semicolon... can you guide me Kieran huggins!

Posted: Sat Jan 20, 2007 7:50 am
by Kieran Huggins
It seems to me that all the stuff you want to find and replace matches a pattern.

You could use preg_replace() to define that pattern and what you want to replace it with.

like: when there's a (line break) followed by a (number), (a colon), then (some characters) and another (line break), replace it with: <(number)>(some characters)</(number)>


You could also try breaking the file into individual lines to begin with using file(), then explode() each line with the :

Try it!

You can use print_r() to check your progress.

Posted: Sun Jan 21, 2007 9:23 pm
by phpwalker
Thanks again Kieran. I've do a search on google and this forum, and found all metacharacters from d11wtq tutorial.

Now i've written something, at least something.

Code: Select all

<?php

$str = "aab : ad2 ";

//output is it matches.
if (preg_match("/^\w+\s\W\s\w+/", $str)) {
    echo "it matches.";
} else {
    echo "it doesn't match.";
}

?>
Now the thing is I have to input the text file from other place. So I've rewrite the code.

Code: Select all

<?php

// set file to read
$file = 'data.txt';

// read file into array
$data = file($file) or die('Could not read file!');

// loop through array and print each line
foreach ($data as $line) {
     print_r($line."<br/>");
     if (preg_match("/^\w+\W\w+/", $line)) {
	     echo "it matches.<br/>";
	     list($field, $content) = explode(":", $line);

	     $replacement = '<'.$field.'>';
	     //$replacement2 = '</'.$field.'>';
		 $field = preg_replace('/\w+/i', $replacement, $field);
		 //$field2 = preg_replace('/\w+/i', $replacement2, $field);
		 echo $field.$content.$field."<br/>";

	 } else {
	     echo "it doesn't match.<br/>";
	 }

}

?>
and the data.txt is some simple raw data:

Code: Select all

1:abc
2:def
3:hij
Output is:
1:abc
it matches.
<1>abc <1>
2:def
it matches.
<2>def <2>
3:ghi
it matches.
<3>ghi<3>
I have no idea how to close tag for the field now. Could you give me some more hint, Kieran?

Posted: Sun Jan 21, 2007 9:30 pm
by Ollie Saunders
Why do you need to do a replacement? Just output what you need.

Code: Select all

echo "<$field>$content</$field>";
Also if you use subpatterns (also know as capturing) you don't need the explode at all. Have a read about preg_match() in the manual. In particular the third parameter and stuff relating to that, it's very cool, you'll want to know about it even if it's just for future reference.

Posted: Sun Jan 21, 2007 11:23 pm
by phpwalker
Opps, i've figured out the mistake I made.

Just replace the field to field1 and I've the start tag and field2 will represent end tag.

Code: Select all

<?php

// set file to read
$file = 'data.txt';

// read file into array
$data = file($file) or die('Could not read file!');

// loop through array and print each line
foreach ($data as $line) {
     print_r($line."<br/>");
     if (preg_match("/^\w+\W\w+/", $line)) {
             echo "it matches.<br/>";
             list($field, $content) = explode(":", $line);

             $replacement = '<'.$field.'>';
             $replacement2 = '</'.$field.'>';
             $field1 = preg_replace('/\w+/i', $replacement, $field);
             $field2 = preg_replace('/\w+/i', $replacement2, $field);
             $str.= $field1.$content.$field."<br/>";

         } else {
             echo "it doesn't match.<br/>";
         }

}

echo $str;

?>
However. the output isn't the intended one. Because it doensn't show me
<1>abc</1>
but only
abc
When I view source of the html, it's
<1>abc</1>
@Ole: I wanted to convert it into XML document, echo it out just to test it, i'm not wanting to echo the statement out only. Besides that, thanks for the preg_match.