Convert text to XML
Moderator: General Moderators
Convert text to XML
Hi there,
I am a newbie PHP programmer and i am trying to convert structured text into XML using PHP. The text file contains multiple choice questions and needs to be converted into XML for easier manipulation.
I will be grateful for any suggestions / pointers on how best to do this.
Regards
JamesB
I am a newbie PHP programmer and i am trying to convert structured text into XML using PHP. The text file contains multiple choice questions and needs to be converted into XML for easier manipulation.
I will be grateful for any suggestions / pointers on how best to do this.
Regards
JamesB
-
Cruzado_Mainfrm
- Forum Contributor
- Posts: 346
- Joined: Sun Jun 15, 2003 11:22 pm
- Location: Miami, FL
Convert text to XML
The text file is like this
and i am expecting to generate xml similar to this
Any suggestions will be appreciated.
Code: Select all
1a. What is the capital city of England?
A. London
B. Cardiff
C. Edinburgh
D. Dublin
2a. Which of these countries do not belong to the Scandanavia?
A. Sweden
B. Poland
C. Norway
D. DenmarkCode: Select all
<item>
<question>What is the capital city of England?</question>
<questionId>1</questionId>
<category>a</category>
<choice>London</choice>
<choice>Cardiff</choice>
<choice>Edinburgh</choice>
<choice>Dublin</choice>
</item>
<item>
<question>Which of these countries do not belong to the Scandanavia?</question>
<questionId>2</questionId>
<category>b</category>
<choice>Sweden</choice>
<choice>Poland</choice>
<choice>Norway</choice>
<choice>Denmark</choice>
</item>-
Cruzado_Mainfrm
- Forum Contributor
- Posts: 346
- Joined: Sun Jun 15, 2003 11:22 pm
- Location: Miami, FL
well, it has logic to the human reader, but to the computer it's kind of complex, but i think it can be done if you are experienced, but like always, it will not be perfect and there will be flaws if one doesn't prevent or assume what data inputs are they going to be.
Also, i suggest you change the layout of the text file, so it can be read using regular expressions.
Also, i suggest you change the layout of the text file, so it can be read using regular expressions.
Hehe, well, that's what he's trying to do in the first place!
My suggestion: if this isn't a massive document, just do it by hand. You're going to end up fixing a whole lot of mess by hand anyway if you try to script it, I suspect. There's just not really enough data there for the computer to work with reliably.
My suggestion: if this isn't a massive document, just do it by hand. You're going to end up fixing a whole lot of mess by hand anyway if you try to script it, I suspect. There's just not really enough data there for the computer to work with reliably.
-
Cruzado_Mainfrm
- Forum Contributor
- Posts: 346
- Joined: Sun Jun 15, 2003 11:22 pm
- Location: Miami, FL
- Leviathan
- Forum Commoner
- Posts: 36
- Joined: Tue Sep 23, 2003 7:00 pm
- Location: Waterloo, ON (Currently in Vancouver, BC)
The only really easy way makes a (potentially big) assumption that all your data will be formatted similarly (ie questions start with numbers, choices start with letters, and there's a blank line between questions and choices). Here's some pseudocode (since I've never done normal file i/o in PHP):
Repeat until end of file:
Read in a line
Does the line start with a number?
If yes, strip off the number, output an item tag, and output a question tag. Read until we hit a blank line; output the contents, then output a close question tag.
If no, we're on whitespace; repeat until we do get a question.
Read in a line. Does the line start with a letter and a period?
If no, close the item tag and go to the top of the loop.
If yes, it's a choice, output a choice tag
Read to the end of the line, and output the text, then a close choice tag
I agree with those who suggest you do this by hand; there's no point spending hours writing a script for something that takes minutes to do. Obviously, if there's one or many large files, a script is better PROVIDED they have a standard format.
Another suggestion. You say you're new to PHP; why not use your programming language of choice (assuming you are a programmer in some other language) to do this for you? XML files are plain text that get interpreted in a special way; you don't need PHP to do anything XML-specific to create your XML files.
Repeat until end of file:
Read in a line
Does the line start with a number?
If yes, strip off the number, output an item tag, and output a question tag. Read until we hit a blank line; output the contents, then output a close question tag.
If no, we're on whitespace; repeat until we do get a question.
Read in a line. Does the line start with a letter and a period?
If no, close the item tag and go to the top of the loop.
If yes, it's a choice, output a choice tag
Read to the end of the line, and output the text, then a close choice tag
I agree with those who suggest you do this by hand; there's no point spending hours writing a script for something that takes minutes to do. Obviously, if there's one or many large files, a script is better PROVIDED they have a standard format.
Another suggestion. You say you're new to PHP; why not use your programming language of choice (assuming you are a programmer in some other language) to do this for you? XML files are plain text that get interpreted in a special way; you don't need PHP to do anything XML-specific to create your XML files.
I like using simple scripts that take a couple of minutes to write.
They don't have to be perfect and the output has to be checked but at least it gives you the basic structure and keeps you from doing repetetive work, which only leads to worse mistakes (oops, now there only three answers.... :-S )took about 2 minutes, xml-convertion is horrible but it's easy to check:
- redirect the output to a file
- check for # lines (input the script can't handle)
- open it with a browser (as xml-file) and let it test its structure
They don't have to be perfect and the output has to be checked but at least it gives you the basic structure and keeps you from doing repetetive work, which only leads to worse mistakes (oops, now there only three answers.... :-S )
Code: Select all
<questioncatalog>
<?php
$patternQuestion = '!^\d+[a-z]+\.\s+(.+)!';
$patternChoice = '!^[[]]+\.\s+(.+)!';
$bInItem = false; // the only element not to be handled in one line
$fp = fopen('data.txt', 'r');
while(!feof($fp))
{
$row = trim(fgets($fp, 2048));
if (strlen($row) > 0)
{
if (preg_match($patternQuestion, $row, $matches))
{
if ($bInItem)
echo "</item>\n";
else
$bInItem = true;
echo "<item>\n <question>$matches[1]</question>\n";
}
else if(preg_match($patternChoice, $row, $matches))
echo " <choice>$matches[1]</choice>\n";
else
echo '# ', $row, "#\n";
}
}
if ($bInItem)
echo '</item>';
?>
</questioncatalog>- redirect the output to a file
- check for # lines (input the script can't handle)
- open it with a browser (as xml-file) and let it test its structure
now let the tool you're using for easier manipulation take care about the contentneeds to be converted into XML for easier manipulation
Thank you to all that has contributed to this thread.
The structure of the text file does not have to be how it is at the moment. The main thing is that there will be a numbered question followed by a set of 4 options. I initially wanted to use comma seperated values(CSV) for the text file, but some of the question / options might have commas in them. That also means putting a question and all its options on the same line?! which i am trying to avoid.
In all there will be about 100 questions in the file.
Regards
The structure of the text file does not have to be how it is at the moment. The main thing is that there will be a numbered question followed by a set of 4 options. I initially wanted to use comma seperated values(CSV) for the text file, but some of the question / options might have commas in them. That also means putting a question and all its options on the same line?! which i am trying to avoid.
In all there will be about 100 questions in the file.
Regards