PHP reading/parsing HTML

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
kepardue
Forum Newbie
Posts: 8
Joined: Sun Aug 07, 2005 8:59 pm

PHP reading/parsing HTML

Post by kepardue »

I've been learning some Xpath well enough to read and edit XML configuration files, but am now tasked to do the same with an HTML file. I'm also trying to do this with Xpath, but have noticed something peculiar: instead of returning the nodes underneath it, it seems to be returning the text content with html tags removed. The structure of the HTML is this, a series of repeating test question and answer choices:

Code: Select all

 
<div class="iDevice_inner">
<div class="question">
<div id="taquestion0b1" class="block" style="display:block">True or False. Cribbing blocks can be used under outriggers to level a bucket truck.</div><br />
<table><tr>
<td><input type="radio" name="key0b1" value="0" /></td>
<td><div id="taoptionAnswer0q0b1" class="block" style="display:block">True</div></td>
</tr><tr>
<td><input type="radio" name="key0b1" value="1" /></td>
<td><div id="taoptionAnswer1q0b1" class="block" style="display:block">False</div></td></tr>
</table>
</div><br />
...
...
</div>
 
I need to extract the questions and answers and assign them to a multidimensional array so that I can create an app that will allow the user to edit them. Unfortunately I'm kind of limited to this structure since I'm working with existing files. In my part real, part pseudo code, this is what I have:

Code: Select all

 
$course_dom = new DOMDocument;
$course_dom->load($course_file);
$xpath = new DOMXPath($course_dom);
$xpath->registerNamespace("m", "http://www.w3.org/1999/xhtml");
$query = $xpath->query("/m:html/m:body/m:div/m:div[@id='main']/m:div/m:form/m:div/m:div/m:div");
 
for($i=0;$i<$query->length;$i++){
     echo "<br />VALUE: ".$query->item($i)->nodeValue;
}
 
I'm sure there's got to be a way to reference the children of my $query->item($i), but I'm not sure of the syntax. Unfortunately, it appears that since there's so many different ways for PHP to deal with XML, I'm not sure how to go about it.
kepardue
Forum Newbie
Posts: 8
Joined: Sun Aug 07, 2005 8:59 pm

Re: PHP reading/parsing HTML

Post by kepardue »

I knew htere had to be a simple solution to the problem. Thanks very much for that pointer, I've got it parsing what I need smoothly now. With the exception of one thing. In the HTML file, there's a <script> tag that contains several Javascript variables inside of a <!-- //<![CDATA[ //]]> --> I can't seem to get this to return as a string so I can use PHP to parse out the variables that I need from it. Any advice on that?

Thanks!
kepardue
Forum Newbie
Posts: 8
Joined: Sun Aug 07, 2005 8:59 pm

Re: PHP reading/parsing HTML

Post by kepardue »

Good sir, I owe you a drink. I've been struggling with this for a week... who knew that the solution could be so simple. That works perfectly, and getting the data as a string is exactly what I need to work with.

Thanks so much!
kepardue
Forum Newbie
Posts: 8
Joined: Sun Aug 07, 2005 8:59 pm

Re: PHP reading/parsing HTML

Post by kepardue »

For some reason, I can't seem to find documentation on xpath's /comment() function. The script in the CDATA is rather lengthy, and it's only returning what appears to be the latter 20,000 characters of it. Unfortuantely, what I need is in the beginning of the script. Is some sort of a substring way to pull the first 5,000 characters? I apologize for asking what must seem to be dumb questions. There just doesn't seem to be a lot of documentation on this floating around out there.
kepardue
Forum Newbie
Posts: 8
Joined: Sun Aug 07, 2005 8:59 pm

Re: PHP reading/parsing HTML

Post by kepardue »

Nope, it's all in the same block. Here's the code where it begins getting the data:
Specifically in the code below, the var key* = * is what I'm needing to get.

Code: Select all

 
... 
                var key18 = 1;
                var key19 = 1;
                function getAnswer()
                {
                doLMSSetValue("cmi.interactions.0.id","key0b1");
                doLMSSetValue("cmi.interactions.0.type","choice");
                doLMSSetValue("cmi.interactions.0.correct_responses.0.pattern",
                          "0");
...
 
The result returned begins with:

Code: Select all

" //< 2; i++) { if (document.getElementById("quizForm1").key0b1[i].checked) { question0...."
kepardue
Forum Newbie
Posts: 8
Joined: Sun Aug 07, 2005 8:59 pm

Re: PHP reading/parsing HTML

Post by kepardue »

Seems to have been an issue with the JavaScript. Something with the "<" symbols that would trigger it to represent all of the preceding code with a "//" Odd that it didn't even show the proper text in the source.

Wrapping the variable in htmlentities() worked just fine. Now I think I'm back in familiar territory. THanks so much for the help and advice!
Post Reply