Page 1 of 1

PHP XML Parsing

Posted: Mon Sep 04, 2006 3:59 am
by Meni
Using php 4, I would like to take a large XML file and parse it.
Looking specifically for large portions of text between tags that are called
<fulltext>INTERESTING TEXT HERE</fulltext>

I would then like to count the number of times a certain word repeats itself throughout ALL the parsed text. Not only through one <fulltext> part.


Is this possible? And if so - how?
Thanks!

Posted: Mon Sep 04, 2006 4:07 am
by CoderGoblin
Possible Yes...

Good starting point is probably... PHP XML

Posted: Mon Sep 04, 2006 4:33 am
by Meni
Been there
I am not a coder at all so this is part chinese part japanese to me (both, by the way, i don't speak).

Care to provide some code?

Posted: Mon Sep 04, 2006 4:56 am
by CoderGoblin
Here is a possibly starting point if you want to work it out yourself.

Code: Select all

<?php
$file = "data.xml";
$process=0;
$counter=0;

function startElement($parser, $name, $attrs)
{
   global $process;
   if ($name == "fulltext") $process=1;
}

function endElement($parser, $name)
{
   global $process;
   if ($name == "fulltext") $process=0;
}

function characterData($parser, $data)
{
  global $counter;
  $haystack=substr(stristr($data,'texttofind'),strlen('texttofind')-1));
  while (!empty($haystack)) {
    $counter++;
    $haystack=substr(stristr($data,'texttofind'),strlen('texttofind)-1));
  }
}

$xml_parser = xml_parser_create();
// use case-folding so we are sure to find the tag in $map_array
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
   die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
   if (!xml_parse($xml_parser, $data, feof($fp))) {
       die(sprintf("XML error: %s at line %d",
                   xml_error_string(xml_get_error_code($xml_parser)),
                   xml_get_current_line_number($xml_parser)));
   }
}
xml_parser_free($xml_parser);
echo("Found {$counter} occurances");
?>
This is a fairly simple example (based on an example on the linked page with only minor modifications). If you want a complete solution you will most likely have to either look for a complete solution (hotscripts for example) or pay someone to provide one. It is worth learning how it works however.