XPATH question load, loadXML, loadHTML
Posted: Thu Oct 25, 2012 10:39 am
I have what I think is a properly formatted xml file (I've removed a bulk of the text for this example):
[text]<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title> </title>
<meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/>
<link href="stylesheet.css" type="text/css" rel="stylesheet"/>
<style type="text/css">
@page { margin-bottom: 5.000000pt; margin-top: 5.000000pt; }</style></head>
<body class="calibre">
<h1 class="title"><span id="anchor6" class="S-T4">ABOUT THIS BOOK</span></h1>
<p class="P-Standard">This book is intended to provide the reader with </p>
<h2 class="title"><span id="anchor7" class="S-T11">GPS Waypoints and Depth</span></h2>
<p class="P-Standard">GPS Waypoints are given in World Geodetic System 1984 (WGS84) i.</p>
<h2 class="title"><span id="anchor8" class="S-T11">Internet</span></h2>
<p class="P-Standard">If you have Internet access, you c</p>
<h2 class="title"><span id="anchor9" class="S-T11">Reporting Information</span></h2>
<p class="P-Standard">If you come across new i:</p>
<p class="P-P8"><span class="S-T11">1. </span>Your Name and Date of Observation</p>
<p class="P-P8"><span class="S-T11">2. </span>Detailed Description</p>
</body></html>[/text]
If I load it into a DOMDocument using either load($filename) or loadXML($string_contents) I have trouble parsing it with xpath. For example query("//p") produces no nodes. If I load it with loadHTML or loadHTMLfile, then the query("//p") works fine.
Are xml xpath queries different or is something else going on with the DOM structure?
[text]<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title> </title>
<meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/>
<link href="stylesheet.css" type="text/css" rel="stylesheet"/>
<style type="text/css">
@page { margin-bottom: 5.000000pt; margin-top: 5.000000pt; }</style></head>
<body class="calibre">
<h1 class="title"><span id="anchor6" class="S-T4">ABOUT THIS BOOK</span></h1>
<p class="P-Standard">This book is intended to provide the reader with </p>
<h2 class="title"><span id="anchor7" class="S-T11">GPS Waypoints and Depth</span></h2>
<p class="P-Standard">GPS Waypoints are given in World Geodetic System 1984 (WGS84) i.</p>
<h2 class="title"><span id="anchor8" class="S-T11">Internet</span></h2>
<p class="P-Standard">If you have Internet access, you c</p>
<h2 class="title"><span id="anchor9" class="S-T11">Reporting Information</span></h2>
<p class="P-Standard">If you come across new i:</p>
<p class="P-P8"><span class="S-T11">1. </span>Your Name and Date of Observation</p>
<p class="P-P8"><span class="S-T11">2. </span>Detailed Description</p>
</body></html>[/text]
If I load it into a DOMDocument using either load($filename) or loadXML($string_contents) I have trouble parsing it with xpath. For example query("//p") produces no nodes. If I load it with loadHTML or loadHTMLfile, then the query("//p") works fine.
Are xml xpath queries different or is something else going on with the DOM structure?