PHP XML Parsing

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Meni
Forum Newbie
Posts: 12
Joined: Wed Apr 19, 2006 3:55 pm

PHP XML Parsing

Post by Meni »

Using php 4, I would like to take a large XML file and parse it.
Looking specifically for large portions of text between tags that are called
<fulltext>INTERESTING TEXT HERE</fulltext>

I would then like to count the number of times a certain word repeats itself throughout ALL the parsed text. Not only through one <fulltext> part.


Is this possible? And if so - how?
Thanks!
User avatar
CoderGoblin
DevNet Resident
Posts: 1425
Joined: Tue Mar 16, 2004 10:03 am
Location: Aachen, Germany

Post by CoderGoblin »

Possible Yes...

Good starting point is probably... PHP XML
Meni
Forum Newbie
Posts: 12
Joined: Wed Apr 19, 2006 3:55 pm

Post by Meni »

Been there
I am not a coder at all so this is part chinese part japanese to me (both, by the way, i don't speak).

Care to provide some code?
User avatar
CoderGoblin
DevNet Resident
Posts: 1425
Joined: Tue Mar 16, 2004 10:03 am
Location: Aachen, Germany

Post by CoderGoblin »

Here is a possibly starting point if you want to work it out yourself.

Code: Select all

<?php
$file = "data.xml";
$process=0;
$counter=0;

function startElement($parser, $name, $attrs)
{
   global $process;
   if ($name == "fulltext") $process=1;
}

function endElement($parser, $name)
{
   global $process;
   if ($name == "fulltext") $process=0;
}

function characterData($parser, $data)
{
  global $counter;
  $haystack=substr(stristr($data,'texttofind'),strlen('texttofind')-1));
  while (!empty($haystack)) {
    $counter++;
    $haystack=substr(stristr($data,'texttofind'),strlen('texttofind)-1));
  }
}

$xml_parser = xml_parser_create();
// use case-folding so we are sure to find the tag in $map_array
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
   die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
   if (!xml_parse($xml_parser, $data, feof($fp))) {
       die(sprintf("XML error: %s at line %d",
                   xml_error_string(xml_get_error_code($xml_parser)),
                   xml_get_current_line_number($xml_parser)));
   }
}
xml_parser_free($xml_parser);
echo("Found {$counter} occurances");
?>
This is a fairly simple example (based on an example on the linked page with only minor modifications). If you want a complete solution you will most likely have to either look for a complete solution (hotscripts for example) or pay someone to provide one. It is worth learning how it works however.
Post Reply