PHP reading/parsing HTML
Posted: Tue Aug 25, 2009 10:44 am
I've been learning some Xpath well enough to read and edit XML configuration files, but am now tasked to do the same with an HTML file. I'm also trying to do this with Xpath, but have noticed something peculiar: instead of returning the nodes underneath it, it seems to be returning the text content with html tags removed. The structure of the HTML is this, a series of repeating test question and answer choices:
I need to extract the questions and answers and assign them to a multidimensional array so that I can create an app that will allow the user to edit them. Unfortunately I'm kind of limited to this structure since I'm working with existing files. In my part real, part pseudo code, this is what I have:
I'm sure there's got to be a way to reference the children of my $query->item($i), but I'm not sure of the syntax. Unfortunately, it appears that since there's so many different ways for PHP to deal with XML, I'm not sure how to go about it.
Code: Select all
<div class="iDevice_inner">
<div class="question">
<div id="taquestion0b1" class="block" style="display:block">True or False. Cribbing blocks can be used under outriggers to level a bucket truck.</div><br />
<table><tr>
<td><input type="radio" name="key0b1" value="0" /></td>
<td><div id="taoptionAnswer0q0b1" class="block" style="display:block">True</div></td>
</tr><tr>
<td><input type="radio" name="key0b1" value="1" /></td>
<td><div id="taoptionAnswer1q0b1" class="block" style="display:block">False</div></td></tr>
</table>
</div><br />
...
...
</div>
Code: Select all
$course_dom = new DOMDocument;
$course_dom->load($course_file);
$xpath = new DOMXPath($course_dom);
$xpath->registerNamespace("m", "http://www.w3.org/1999/xhtml");
$query = $xpath->query("/m:html/m:body/m:div/m:div[@id='main']/m:div/m:form/m:div/m:div/m:div");
for($i=0;$i<$query->length;$i++){
echo "<br />VALUE: ".$query->item($i)->nodeValue;
}