CDATA Issue

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

Post Reply
habib009pk
Forum Commoner
Posts: 43
Joined: Sun Jul 05, 2009 11:28 pm

CDATA Issue

Post by habib009pk »

Dear Friends,

I am facing a problem I have an xml file which have various tags in which one tag is name Parsed_data having CDATA
e.g.

<parsed_data> <![CDATA[ <Full model name>[ Fuso, Fighter FK71GJ-760343]</Full model name> <I-auc comments>[ at preliminary inspection truck vehicle . cabin raising/ upper thing operation verification. is not possible]</I-auc comments> <NOx>[ agreement]</NOx> <air-conditioning>[AC]</air-conditioning> <bidding deadline>[2009 year08 month12 day 11 time09 minute]</bidding deadline> <capacity>[0 person]</capacity> <car history>[ renta car]</car history> <color substitution>[ ]</color substitution> <colorNO>[ ]</colorNO> <condition>[[ acceptance ended]]</condition> <drive>[ ]</drive> <equipment>[PS, PW, AB]</equipment> <exhibition>[ large size car block]</exhibition> <exterior color>[ white]</exterior color> <form>[02TR]</form> <fuel>[D]</fuel> <hall>[ BAY AUC [ Osaka (metropolitan area) Osaka city]]</hall> <holding frequency>[1522]</holding frequency> <interior color>[ ]</interior color> <loading>[2800kg]</loading> <model>[ ]</model> <special mention>[PS.PW..... airbag. dirty interior6.2 meter cut2.93 ton3 step radio controlled hook in.. inside sizeL5510...URV343.... seat tear. steering wheel worn. tappet. making a sound. carrier boarding scars, concaves.... flap scratch dent processing hole.. lower part other one part rust. scratch dent.S guard bending..... sticker peel trace.. rightF distortion of pillar.F accident.Nox agreement.....2800kg....R presence of ticket9310 jpy distance58037km]</special mention> <symbol>[KK]</symbol> <vehicle inspection "shaken">[ ]</vehicle inspection "shaken"> <year>[H14/12]</year> ]]> </parsed_data>


Now please anyone help me how can i fetch these records tag name wise

Thanks and regards

Habib Ahmad
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: CDATA Issue

Post by tr0gd0rr »

That is a very strange piece of data 8O . A pseudo XML document inside an XML document! I say pseudo because there are a few syntax errors on the inside document. I just tried the following which seems to work. I don't think it will work for all cases of the weird pseudo XML document because who knows what might be the syntax of the pseudo XML.

Parse the inner document:

Code: Select all

<?php
 
$xml = '<parsed_data> <![CDATA[ ... ]]> </parsed_data>';
 
$doc = new DOMDocument();
$doc->loadXML($xml);
$data = $doc->getElementsByTagName('parsed_data')->item(0)->nodeValue;
 
// data is not a valid xml document, but let's try some hackish preg_replace
// spaces are not allowed in xml tag names
$xml = preg_replace('/<[^>]+>/e', 'str_replace(" ", "_", "$0")', $data);
// xml docs need a root element
$xml = "<root>$xml</root>";
// remove weird data <vehicle_inspection_"shaken">
$xml = str_replace('_"shaken"', '', $xml);
// create new doc
$doc = new DOMDocument();
$doc->loadXML($xml);
// example fetch one item
echo $doc->getElementsByTagName('Full_model_name')->item(0)->nodeValue;
// outputs [ Fuso, Fighter FK71GJ-760343]
Here is another possibility using a regex that does not handle "attributes":

Code: Select all

<?php
 
$xml = '<parsed_data> <![CDATA[ ... ]]> </parsed_data>';
 
$doc = new DOMDocument();
$doc->loadXML($xml);
$data = $doc->getElementsByTagName('parsed_data')->item(0)->nodeValue;
 
preg_match_all('~<([^>]+)>([^>]+)</[^>]+>~', $data, $matches, PREG_SET_ORDER);
 
$hash = array();
foreach ($matches as $match) {
  $hash[$match[1]] = $match[2];
}
echo '<pre>';
print_r($hash);
die();
 
// outputs
Array
(
    [Full model name] => [ Fuso, Fighter FK71GJ-760343]
    ...
    [year] => [H14/12]
)
habib009pk
Forum Commoner
Posts: 43
Joined: Sun Jul 05, 2009 11:28 pm

CDATA Issue

Post by habib009pk »

Dear tr0gd0rr

Thank you very much your Second code is very suitable for me.

I am very thankful to you.

Regards
Habib Ahmad
Post Reply