[Solved] XML: How to convert plan text into XML using php?

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

[Solved] XML: How to convert plan text into XML using php?

Post by phpwalker »

I've no idea how to accomplish this task, googled around but no luck.

Can you guys give me some ideas or some clues how to start with it?

Or maybe a tutorials for me will be good. I really couldn't find one.

Let say I've this text:

To: Adrian
From: Gorillaz
Date:12 January 2007
Message: Hi, adrian, I need your help.
How do I convert it into XML?
Last edited by phpwalker on Thu Jan 25, 2007 12:02 am, edited 1 time in total.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

What structure of XML would you like that to generate?

Code: Select all

<record>
    <to>Adrian</to>
    <from>Gorillaz</from>
    <date>12 January 2007</date>
    <message>Hi, adrian, I need your help.</message> 
</record>
You can use SimpleXML (PHP extension - you'll find it on php.net) to turn an array into XML, you if you explode each line of what you showed at ":" you should get off to a good start ;)
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

SimpleXML is a good extension. Here's another way using preg_replace_callback():

Code: Select all

$str = <<<EOF

To: Adrian
From: Gorillaz
Date:12 January 2007
Message: Hi, adrian, I need your help. 

EOF;

echo preg_replace_callback('#(.*?):(.*?)\n#','callback',$str);
function callback($m){
	return '<'.trim($m['1']).'>'.trim($m['2']).'</'.trim($m['1']).'>'."\n";
}
It's my favourite function today because it earned me some spam!
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

It's my favourite function today because it earned me some spam!
That it did! The meaty kind not the unwanted email kind.
Although to be honest I would just explode by '\n' and then by ':' limited to 2 explosions.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

ole wrote:
It's my favourite function today because it earned me some spam!
That it did! The meaty kind not the unwanted email kind.
Although to be honest I would just explode by '\n' and then by ':' limited to 2 explosions.
ole's is faster and easier than mine... is it spam worthy? I think so. Send this man a can of spam right away.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Send this man a can of spam right away.
OOohh yes please! I would have gone nicely with my pasta I just had.

Image
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

That's a nicely labelled diagram... I so totally never realised there was more to a tin of spam than just pink processed meat.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

As you can tell from my sexy spam-coloured velour uniform, I'm a big fan!
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

oh my god, why so many people spamming here?...

reply@d11wtq: I just want to convert a simple text data into XML, is there any tutorial that will teach and explain every step of the code?

Okay, let say I have the data in following text file:
data1: abc
data2: 456
data3: 789
I have to create a conversion page which can browse and local hard disk and
accept this text.txt as an input. Click on the Conversion button and will
resulted in the output in XML format.

And the output will be a properly formatted XML file (text.xml)

After that, I need to create another Conversion Page to convert the XML back to text file.

I've no clue, what I search always the RDF or RSS XML, I want the pure XML. I need some guidances for this.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

Two options:

1. use preg_replace() and/or explode() and/or str_replace() to textually build an xml document. Then import it in one of the many php xml extensions and save it as a file (to ensure good form.)

2. use the DOM or DOM-XML functions to build a DOM of the data, then save that as XML

I go with #2
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

Thanks Kieran Huggins!!!! Because I've been thinking for a while how to convert plain text into XML by using php. My sudden spark is a piece of php code to scan through the plain text, then search for the semicolon, before and after.

Code: Select all

1: ABC
2: DEF
3: GHI


Before semicolon, add "<" tag and ">" tag to the "1". After semicolon, it is the value, before scanning to the new line, add "<" tag and ">" tag again with the "1" inside it.

However, I don't really know how to scan the things and the semicolon... can you guide me Kieran huggins!
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

It seems to me that all the stuff you want to find and replace matches a pattern.

You could use preg_replace() to define that pattern and what you want to replace it with.

like: when there's a (line break) followed by a (number), (a colon), then (some characters) and another (line break), replace it with: <(number)>(some characters)</(number)>


You could also try breaking the file into individual lines to begin with using file(), then explode() each line with the :

Try it!

You can use print_r() to check your progress.
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

Thanks again Kieran. I've do a search on google and this forum, and found all metacharacters from d11wtq tutorial.

Now i've written something, at least something.

Code: Select all

<?php

$str = "aab : ad2 ";

//output is it matches.
if (preg_match("/^\w+\s\W\s\w+/", $str)) {
    echo "it matches.";
} else {
    echo "it doesn't match.";
}

?>
Now the thing is I have to input the text file from other place. So I've rewrite the code.

Code: Select all

<?php

// set file to read
$file = 'data.txt';

// read file into array
$data = file($file) or die('Could not read file!');

// loop through array and print each line
foreach ($data as $line) {
     print_r($line."<br/>");
     if (preg_match("/^\w+\W\w+/", $line)) {
	     echo "it matches.<br/>";
	     list($field, $content) = explode(":", $line);

	     $replacement = '<'.$field.'>';
	     //$replacement2 = '</'.$field.'>';
		 $field = preg_replace('/\w+/i', $replacement, $field);
		 //$field2 = preg_replace('/\w+/i', $replacement2, $field);
		 echo $field.$content.$field."<br/>";

	 } else {
	     echo "it doesn't match.<br/>";
	 }

}

?>
and the data.txt is some simple raw data:

Code: Select all

1:abc
2:def
3:hij
Output is:
1:abc
it matches.
<1>abc <1>
2:def
it matches.
<2>def <2>
3:ghi
it matches.
<3>ghi<3>
I have no idea how to close tag for the field now. Could you give me some more hint, Kieran?
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Why do you need to do a replacement? Just output what you need.

Code: Select all

echo "<$field>$content</$field>";
Also if you use subpatterns (also know as capturing) you don't need the explode at all. Have a read about preg_match() in the manual. In particular the third parameter and stuff relating to that, it's very cool, you'll want to know about it even if it's just for future reference.
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

Opps, i've figured out the mistake I made.

Just replace the field to field1 and I've the start tag and field2 will represent end tag.

Code: Select all

<?php

// set file to read
$file = 'data.txt';

// read file into array
$data = file($file) or die('Could not read file!');

// loop through array and print each line
foreach ($data as $line) {
     print_r($line."<br/>");
     if (preg_match("/^\w+\W\w+/", $line)) {
             echo "it matches.<br/>";
             list($field, $content) = explode(":", $line);

             $replacement = '<'.$field.'>';
             $replacement2 = '</'.$field.'>';
             $field1 = preg_replace('/\w+/i', $replacement, $field);
             $field2 = preg_replace('/\w+/i', $replacement2, $field);
             $str.= $field1.$content.$field."<br/>";

         } else {
             echo "it doesn't match.<br/>";
         }

}

echo $str;

?>
However. the output isn't the intended one. Because it doensn't show me
<1>abc</1>
but only
abc
When I view source of the html, it's
<1>abc</1>
@Ole: I wanted to convert it into XML document, echo it out just to test it, i'm not wanting to echo the statement out only. Besides that, thanks for the preg_match.
Post Reply