Page 1 of 1
Convert text to XML
Posted: Tue Sep 23, 2003 2:42 pm
by jbatty
Hi there,
I am a newbie PHP programmer and i am trying to convert structured text into XML using PHP. The text file contains multiple choice questions and needs to be converted into XML for easier manipulation.
I will be grateful for any suggestions / pointers on how best to do this.
Regards
JamesB
Posted: Tue Sep 23, 2003 6:33 pm
by Cruzado_Mainfrm
can we see a preview at least of the 'structured text'?
Convert text to XML
Posted: Tue Sep 23, 2003 6:57 pm
by jbatty
The text file is like this
Code: Select all
1a. What is the capital city of England?
A. London
B. Cardiff
C. Edinburgh
D. Dublin
2a. Which of these countries do not belong to the Scandanavia?
A. Sweden
B. Poland
C. Norway
D. Denmark
and i am expecting to generate xml similar to this
Code: Select all
<item>
<question>What is the capital city of England?</question>
<questionId>1</questionId>
<category>a</category>
<choice>London</choice>
<choice>Cardiff</choice>
<choice>Edinburgh</choice>
<choice>Dublin</choice>
</item>
<item>
<question>Which of these countries do not belong to the Scandanavia?</question>
<questionId>2</questionId>
<category>b</category>
<choice>Sweden</choice>
<choice>Poland</choice>
<choice>Norway</choice>
<choice>Denmark</choice>
</item>
Any suggestions will be appreciated.
Posted: Tue Sep 23, 2003 7:06 pm
by Unipus
I can't see any consistent, logical way that a computer would be able to do that.
Posted: Tue Sep 23, 2003 8:24 pm
by Cruzado_Mainfrm
well, it has logic to the human reader, but to the computer it's kind of complex, but i think it can be done if you are experienced, but like always, it will not be perfect and there will be flaws if one doesn't prevent or assume what data inputs are they going to be.
Also, i suggest you change the layout of the text file, so it can be read using regular expressions.
Posted: Tue Sep 23, 2003 8:45 pm
by Unipus
Hehe, well, that's what he's trying to do in the first place!
My suggestion: if this isn't a massive document, just do it by hand. You're going to end up fixing a whole lot of mess by hand anyway if you try to script it, I suspect. There's just not really enough data there for the computer to work with reliably.
Posted: Tue Sep 23, 2003 9:05 pm
by Cruzado_Mainfrm
yeah but what i meant is that it will be hard to read a file like that, because there's no logical pattern, so if the pattern where different, maybe the conversion or translation will be easier...
Posted: Tue Sep 23, 2003 10:33 pm
by Leviathan
The only really easy way makes a (potentially big) assumption that all your data will be formatted similarly (ie questions start with numbers, choices start with letters, and there's a blank line between questions and choices). Here's some pseudocode (since I've never done normal file i/o in PHP):
Repeat until end of file:
Read in a line
Does the line start with a number?
If yes, strip off the number, output an item tag, and output a question tag. Read until we hit a blank line; output the contents, then output a close question tag.
If no, we're on whitespace; repeat until we do get a question.
Read in a line. Does the line start with a letter and a period?
If no, close the item tag and go to the top of the loop.
If yes, it's a choice, output a choice tag
Read to the end of the line, and output the text, then a close choice tag
I agree with those who suggest you do this by hand; there's no point spending hours writing a script for something that takes minutes to do. Obviously, if there's one or many large files, a script is better PROVIDED they have a standard format.
Another suggestion. You say you're new to PHP; why not use your programming language of choice (assuming you are a programmer in some other language) to do this for you? XML files are plain text that get interpreted in a special way; you don't need PHP to do anything XML-specific to create your XML files.
Posted: Wed Sep 24, 2003 2:43 am
by volka
I like using simple scripts that take a couple of minutes to write.
They don't have to be perfect and the output has to be checked but at least it gives you the basic structure and keeps you from doing repetetive work, which only leads to worse mistakes (oops, now there only three answers.... :-S )
Code: Select all
<questioncatalog>
<?php
$patternQuestion = '!^\d+[a-z]+\.\s+(.+)!';
$patternChoice = '!^[[]]+\.\s+(.+)!';
$bInItem = false; // the only element not to be handled in one line
$fp = fopen('data.txt', 'r');
while(!feof($fp))
{
$row = trim(fgets($fp, 2048));
if (strlen($row) > 0)
{
if (preg_match($patternQuestion, $row, $matches))
{
if ($bInItem)
echo "</item>\n";
else
$bInItem = true;
echo "<item>\n <question>$matches[1]</question>\n";
}
else if(preg_match($patternChoice, $row, $matches))
echo " <choice>$matches[1]</choice>\n";
else
echo '# ', $row, "#\n";
}
}
if ($bInItem)
echo '</item>';
?>
</questioncatalog>
took about 2 minutes, xml-convertion is horrible but it's easy to check:
- redirect the output to a file
- check for # lines (input the script can't handle)
- open it with a browser (as xml-file) and let it test its structure
needs to be converted into XML for easier manipulation
now let the tool you're using for easier manipulation take care about the content

Posted: Wed Sep 24, 2003 4:32 am
by jbatty
Thank you to all that has contributed to this thread.
The structure of the text file does not have to be how it is at the moment. The main thing is that there will be a numbered question followed by a set of 4 options. I initially wanted to use comma seperated values(CSV) for the text file, but some of the question / options might have commas in them. That also means putting a question and all its options on the same line?! which i am trying to avoid.
In all there will be about 100 questions in the file.
Regards