Page 1 of 1

I cannot get xpath working

Posted: Fri May 08, 2009 3:37 pm
by pierref
Paradox: why does the first part of this snippet of code work, but not the second one with xpath:

Code: Select all

                $xmlstr = html_entity_decode(file_get_contents($items_url), ENT_NOQUOTES, 'UTF-8');
            
                $xml = simplexml_load_string($xmlstr);
                
                if ($xml) {
                    echo "\n<p>";
                    
                    foreach ($xml->body as $s) {
                        echo "got body node: " . $s->div[0]->div[0]->h2;
                    } /* WORKS */           
                    
                    foreach ($xml->xpath('body') as $s) {
                        echo "got body node with xpath: " . $s->div[0]->div[0]->h2;
                    } /* DOESN'T WORK: returned $s is FALSE */
 
                    echo "\n</p>";
                }
 
Who is the guru going to explain me what's going on? The file that is parsed is xhtml, and I suppose it is well formed because the first part is working correctly. Input file is http://users.telenet.be/cr27933/nieuws.html.

Re: I cannot get xpath working

Posted: Fri May 08, 2009 3:54 pm
by infolock
try doing a print_r on each level of the array (and each sub-key/sub-array) and the answer will be pretty evident.

Re: I cannot get xpath working

Posted: Fri May 08, 2009 3:56 pm
by jayshields
Unrelated to Xpath - you've got a stray double quote on line 13.

Re: I cannot get xpath working

Posted: Fri May 08, 2009 4:02 pm
by pierref
jayshields wrote:Unrelated to Xpath - you've got a stray double quote on line 13.
You are right, Jay, but that's only a tipo I entered when I copy/pasted. I corrected the code but the error is still there.

Re: I cannot get xpath working

Posted: Fri May 08, 2009 4:11 pm
by pierref
infolock wrote:try doing a print_r on each level of the array (and each sub-key/sub-array) and the answer will be pretty evident.
Thank you, infolock, but I did it. I see nothing strange. Don't forget that the first snippet of code works and the second doesn't, while these should be equivalent. That's what strange.

Re: I cannot get xpath working

Posted: Fri May 08, 2009 4:14 pm
by infolock
show us what the printout of the array is.

if it's huge, just give us the first 2 keys.

Re: I cannot get xpath working

Posted: Fri May 08, 2009 4:20 pm
by pierref
infolock wrote:show us what the printout of the array is.

if it's huge, just give us the first 2 keys.
This is the beginning of the array:

Code: Select all

SimpleXMLElement Object
(
    [head] => SimpleXMLElement Object
        (
            [meta] => SimpleXMLElement Object
                (
                    [@attributes] => Array
                        (
                            [http-equiv] => Content-Type
                            [content] => text/html; charset=utf-8
                        )
 
                )
 
            [title] => Studentenhuis Arenberg > Nieuws
            [link] => SimpleXMLElement Object
                (
                    [@attributes] => Array
                        (
                            [href] => style.css
                            [rel] => stylesheet
                            [type] => text/css
                        )
 
                )
 
        )
 
    [body] => SimpleXMLElement Object
        (
            [div] => SimpleXMLElement Object
                (
                    [@attributes] => Array
                        (
                            [id] => container
                        )
 
                    [div] => Array
                        (
                            [0] => SimpleXMLElement Object
                                (
                                    [@attributes] => Array
                                        (
                                            [id] => header
                                        )
 
                                    [img] => SimpleXMLElement Object
                                        (
                                            [@attributes] => Array
                                                (
                                                    [src] => img/logo_mini.gif
                                                    [alt] => Logo Arenberg
                                                )
 
                                        )
 
                                    [h1] => Studentenhuis Arenberg > Nieuws
                                    [h2] => Maak van je studentenjaren de beste van je leven
                                    [p] => Schapenstraat 29 — 3000 Leuven 
                    — tel. 016-23 21 52 
-------------------------------------------- I cut the rest -----------------------
 

Re: I cannot get xpath working

Posted: Sat May 09, 2009 10:35 am
by pierref
When I removed the xmlns attribute form the root element of my xhtml file, it started working as I was expecting. I discovered it by destroying little by little my input file removing step by step anything that could cause a trouble. When my input file was near to be empty, I found it was the tag

Code: Select all

<html xmlns="http://www.w3.org/1999/xhtml">
that had to be changed in

Code: Select all

<html>
:P

When I read about xhtml, I see the xmlns attribute of the root element html is mandatory. If anyone knows why, let me know, please.

As a workaround, I wanted to remove that attribute of the root tag, but now, I cannot access the root element itself, so if anyone can help me, he is welcome!

Re: I cannot get xpath working

Posted: Sat May 09, 2009 4:18 pm
by pierref
pierref wrote:As a workaround, I wanted to remove that attribute [xmlns] from the root tag,
but I didn't find any function for removing an attribute from the xml tree in the simplexml library, so I did it as follows:

Code: Select all

$items_url = "http://users.telenet.be/cr27933/nieuws.html";
$html = file_get_contents($items_url);
$html_without_entities = html_entity_decode($html, ENT_NOQUOTES,'UTF-8');
$pattern = '<html xmlns="[^"]*"';
$replacement = '<html';
$html_root_stripped = ereg_replace($pattern, $replacement, $html_without_entities);
$xml = simplexml_load_string($html_root_stripped);
This is a rather ugly solution. If you know something more elegant, please tell me. Thanks in advance.