Page 1 of 1
I cannot get xpath working
Posted: Fri May 08, 2009 3:37 pm
by pierref
Paradox: why does the first part of this snippet of code work, but not the second one with xpath:
Code: Select all
$xmlstr = html_entity_decode(file_get_contents($items_url), ENT_NOQUOTES, 'UTF-8');
$xml = simplexml_load_string($xmlstr);
if ($xml) {
echo "\n<p>";
foreach ($xml->body as $s) {
echo "got body node: " . $s->div[0]->div[0]->h2;
} /* WORKS */
foreach ($xml->xpath('body') as $s) {
echo "got body node with xpath: " . $s->div[0]->div[0]->h2;
} /* DOESN'T WORK: returned $s is FALSE */
echo "\n</p>";
}
Who is the guru going to explain me what's going on? The file that is parsed is xhtml, and I suppose it is well formed because the first part is working correctly. Input file is
http://users.telenet.be/cr27933/nieuws.html.
Re: I cannot get xpath working
Posted: Fri May 08, 2009 3:54 pm
by infolock
try doing a print_r on each level of the array (and each sub-key/sub-array) and the answer will be pretty evident.
Re: I cannot get xpath working
Posted: Fri May 08, 2009 3:56 pm
by jayshields
Unrelated to Xpath - you've got a stray double quote on line 13.
Re: I cannot get xpath working
Posted: Fri May 08, 2009 4:02 pm
by pierref
jayshields wrote:Unrelated to Xpath - you've got a stray double quote on line 13.
You are right, Jay, but that's only a tipo I entered when I copy/pasted. I corrected the code but the error is still there.
Re: I cannot get xpath working
Posted: Fri May 08, 2009 4:11 pm
by pierref
infolock wrote:try doing a print_r on each level of the array (and each sub-key/sub-array) and the answer will be pretty evident.
Thank you, infolock, but I did it. I see nothing strange. Don't forget that the first snippet of code works and the second doesn't, while these should be equivalent. That's what strange.
Re: I cannot get xpath working
Posted: Fri May 08, 2009 4:14 pm
by infolock
show us what the printout of the array is.
if it's huge, just give us the first 2 keys.
Re: I cannot get xpath working
Posted: Fri May 08, 2009 4:20 pm
by pierref
infolock wrote:show us what the printout of the array is.
if it's huge, just give us the first 2 keys.
This is the beginning of the array:
Code: Select all
SimpleXMLElement Object
(
[head] => SimpleXMLElement Object
(
[meta] => SimpleXMLElement Object
(
[@attributes] => Array
(
[http-equiv] => Content-Type
[content] => text/html; charset=utf-8
)
)
[title] => Studentenhuis Arenberg > Nieuws
[link] => SimpleXMLElement Object
(
[@attributes] => Array
(
[href] => style.css
[rel] => stylesheet
[type] => text/css
)
)
)
[body] => SimpleXMLElement Object
(
[div] => SimpleXMLElement Object
(
[@attributes] => Array
(
[id] => container
)
[div] => Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[id] => header
)
[img] => SimpleXMLElement Object
(
[@attributes] => Array
(
[src] => img/logo_mini.gif
[alt] => Logo Arenberg
)
)
[h1] => Studentenhuis Arenberg > Nieuws
[h2] => Maak van je studentenjaren de beste van je leven
[p] => Schapenstraat 29 — 3000 Leuven
— tel. 016-23 21 52
-------------------------------------------- I cut the rest -----------------------
Re: I cannot get xpath working
Posted: Sat May 09, 2009 10:35 am
by pierref
When I removed the xmlns attribute form the root element of my xhtml file, it started working as I was expecting. I discovered it by destroying little by little my input file removing step by step anything that could cause a trouble. When my input file was near to be empty, I found it was the tag
Code: Select all
<html xmlns="http://www.w3.org/1999/xhtml">
that had to be changed in
When I read about xhtml, I see the xmlns attribute of the root element html is mandatory. If anyone knows why, let me know, please.
As a workaround, I wanted to remove that attribute of the root tag, but now, I cannot access the root element itself, so if anyone can help me, he is welcome!
Re: I cannot get xpath working
Posted: Sat May 09, 2009 4:18 pm
by pierref
pierref wrote:As a workaround, I wanted to remove that attribute [xmlns] from the root tag,
but I didn't find any function for removing an attribute from the xml tree in the simplexml library, so I did it as follows:
Code: Select all
$items_url = "http://users.telenet.be/cr27933/nieuws.html";
$html = file_get_contents($items_url);
$html_without_entities = html_entity_decode($html, ENT_NOQUOTES,'UTF-8');
$pattern = '<html xmlns="[^"]*"';
$replacement = '<html';
$html_root_stripped = ereg_replace($pattern, $replacement, $html_without_entities);
$xml = simplexml_load_string($html_root_stripped);
This is a rather ugly solution. If you know something more elegant, please tell me. Thanks in advance.