Page 1 of 1
PHP XML Parsing
Posted: Mon Sep 04, 2006 3:59 am
by Meni
Using php 4, I would like to take a large XML file and parse it.
Looking specifically for large portions of text between tags that are called
<fulltext>INTERESTING TEXT HERE</fulltext>
I would then like to count the number of times a certain word repeats itself throughout ALL the parsed text. Not only through one <fulltext> part.
Is this possible? And if so - how?
Thanks!
Posted: Mon Sep 04, 2006 4:07 am
by CoderGoblin
Possible Yes...
Good starting point is probably...
PHP XML
Posted: Mon Sep 04, 2006 4:33 am
by Meni
Been there
I am not a coder at all so this is part chinese part japanese to me (both, by the way, i don't speak).
Care to provide some code?
Posted: Mon Sep 04, 2006 4:56 am
by CoderGoblin
Here is a possibly starting point if you want to work it out yourself.
Code: Select all
<?php
$file = "data.xml";
$process=0;
$counter=0;
function startElement($parser, $name, $attrs)
{
global $process;
if ($name == "fulltext") $process=1;
}
function endElement($parser, $name)
{
global $process;
if ($name == "fulltext") $process=0;
}
function characterData($parser, $data)
{
global $counter;
$haystack=substr(stristr($data,'texttofind'),strlen('texttofind')-1));
while (!empty($haystack)) {
$counter++;
$haystack=substr(stristr($data,'texttofind'),strlen('texttofind)-1));
}
}
$xml_parser = xml_parser_create();
// use case-folding so we are sure to find the tag in $map_array
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
echo("Found {$counter} occurances");
?>
This is a fairly simple example (based on an example on the linked page with only minor modifications). If you want a complete solution you will most likely have to either look for a complete solution (hotscripts for example) or pay someone to provide one. It is worth learning how it works however.