Page 1 of 1

CDATA Issue

Posted: Wed Aug 12, 2009 6:50 am
by habib009pk
Dear Friends,

I am facing a problem I have an xml file which have various tags in which one tag is name Parsed_data having CDATA
e.g.

<parsed_data> <![CDATA[ <Full model name>[ Fuso, Fighter FK71GJ-760343]</Full model name> <I-auc comments>[ at preliminary inspection truck vehicle . cabin raising/ upper thing operation verification. is not possible]</I-auc comments> <NOx>[ agreement]</NOx> <air-conditioning>[AC]</air-conditioning> <bidding deadline>[2009 year08 month12 day 11 time09 minute]</bidding deadline> <capacity>[0 person]</capacity> <car history>[ renta car]</car history> <color substitution>[ ]</color substitution> <colorNO>[ ]</colorNO> <condition>[[ acceptance ended]]</condition> <drive>[ ]</drive> <equipment>[PS, PW, AB]</equipment> <exhibition>[ large size car block]</exhibition> <exterior color>[ white]</exterior color> <form>[02TR]</form> <fuel>[D]</fuel> <hall>[ BAY AUC [ Osaka (metropolitan area) Osaka city]]</hall> <holding frequency>[1522]</holding frequency> <interior color>[ ]</interior color> <loading>[2800kg]</loading> <model>[ ]</model> <special mention>[PS.PW..... airbag. dirty interior6.2 meter cut2.93 ton3 step radio controlled hook in.. inside sizeL5510...URV343.... seat tear. steering wheel worn. tappet. making a sound. carrier boarding scars, concaves.... flap scratch dent processing hole.. lower part other one part rust. scratch dent.S guard bending..... sticker peel trace.. rightF distortion of pillar.F accident.Nox agreement.....2800kg....R presence of ticket9310 jpy distance58037km]</special mention> <symbol>[KK]</symbol> <vehicle inspection "shaken">[ ]</vehicle inspection "shaken"> <year>[H14/12]</year> ]]> </parsed_data>


Now please anyone help me how can i fetch these records tag name wise

Thanks and regards

Habib Ahmad

Re: CDATA Issue

Posted: Wed Aug 12, 2009 9:35 am
by tr0gd0rr
That is a very strange piece of data 8O . A pseudo XML document inside an XML document! I say pseudo because there are a few syntax errors on the inside document. I just tried the following which seems to work. I don't think it will work for all cases of the weird pseudo XML document because who knows what might be the syntax of the pseudo XML.

Parse the inner document:

Code: Select all

<?php
 
$xml = '<parsed_data> <![CDATA[ ... ]]> </parsed_data>';
 
$doc = new DOMDocument();
$doc->loadXML($xml);
$data = $doc->getElementsByTagName('parsed_data')->item(0)->nodeValue;
 
// data is not a valid xml document, but let's try some hackish preg_replace
// spaces are not allowed in xml tag names
$xml = preg_replace('/<[^>]+>/e', 'str_replace(" ", "_", "$0")', $data);
// xml docs need a root element
$xml = "<root>$xml</root>";
// remove weird data <vehicle_inspection_"shaken">
$xml = str_replace('_"shaken"', '', $xml);
// create new doc
$doc = new DOMDocument();
$doc->loadXML($xml);
// example fetch one item
echo $doc->getElementsByTagName('Full_model_name')->item(0)->nodeValue;
// outputs [ Fuso, Fighter FK71GJ-760343]
Here is another possibility using a regex that does not handle "attributes":

Code: Select all

<?php
 
$xml = '<parsed_data> <![CDATA[ ... ]]> </parsed_data>';
 
$doc = new DOMDocument();
$doc->loadXML($xml);
$data = $doc->getElementsByTagName('parsed_data')->item(0)->nodeValue;
 
preg_match_all('~<([^>]+)>([^>]+)</[^>]+>~', $data, $matches, PREG_SET_ORDER);
 
$hash = array();
foreach ($matches as $match) {
  $hash[$match[1]] = $match[2];
}
echo '<pre>';
print_r($hash);
die();
 
// outputs
Array
(
    [Full model name] => [ Fuso, Fighter FK71GJ-760343]
    ...
    [year] => [H14/12]
)

CDATA Issue

Posted: Thu Aug 13, 2009 2:57 am
by habib009pk
Dear tr0gd0rr

Thank you very much your Second code is very suitable for me.

I am very thankful to you.

Regards
Habib Ahmad