Page 1 of 1

DOMDocument parsing: search for an very easy to find element

Posted: Sat Dec 25, 2010 7:17 am
by lin
hello dear Community, 8)

Again: first of all: felize Navidad - I wanna wish you a Merry Christmas!!

i'm trying to debug a little DOMDocument object in PHP. Ideally it'd be nice if I could get DOMDocument to output in a array-like format, to store the data in a database!

My example: head over to the url -
see the example: click the following target Bildungsserver Hessen - Siegfried-Pickert Schule

I investigated the Sourcecode: Note, I want to filter out the data that that is in the following class <div class="floatbox">

See the sourcecode:

Code: Select all

<span class="grey"> <span style="font-size:x-small;">></span></span>
<a class="navLink" href="http://dms-schule.bildung.hessen.de/suchen/index.html" title="Suchformulare zum hessischen schulischen Bildungssystem">suche</a>
              </div>
            </div>
          <!-- begin of text -->
           <h3>Siegfried-Pickert Schule</h3>
 <div class="floatbox">

So see my approach: Here is the solution return the labels and values in a formatted array ready for input to mysql!

[syntax]
<?php

$dom = new DOMDocument();
@$dom->loadHTMLFile('http://dms-schule.bildung.hessen.de/suc ... chool=8880');
$divElement = $dom->getElementById('floatbox');

$innerHTML= '';
$children = $divElement->childNodes;
foreach ($children as $child) {
$innerHTML = $child->ownerDocument->saveXML( $child );

$doc = new DOMDocument();
$doc->loadHTML($innerHTML);
//$divElementNew = $dom->getElementsByTagName('td');
$divElementNew = $dom->getElementsByTagname('td');

/*** the array to return ***/
$out = array();
foreach ($divElementNew as $item)
{
/*** add node value to the out array ***/
$out[] = $item->nodeValue;
}

echo '<pre>';
print_r($out);
echo '</pre>';

}
[/syntax]

well Duhh: this outputs lot of garbage. The code spits out a lot of html anyway.
What can i do to get a more cleaned up code!?

Question: What is wrong with the idea of using this attribute:

[syntax] $dom->getElementById('floatbox'); [/syntax]


any idea!? Can anybody have a look at the code and review it. I need to debug it a bit!

Any and all help will greatly appreciated.

season-greetings
db1 ;)

Re: DOMDocument parsing: search for an very easy to find ele

Posted: Sat Dec 25, 2010 11:47 am
by Weirdan
Question: What is wrong with the idea of using this attribute:

Code: Select all

 $dom->getElementById('floatbox');  
in original html it's not an id, it's a class.

Re: DOMDocument parsing: search for an very easy to find ele

Posted: Sat Dec 25, 2010 12:15 pm
by lin
good evening dear Weirdan,

many thanks for the reply!

You wrote:
Weirdan wrote:
Question: What is wrong with the idea of using this attribute:

Code: Select all

 $dom->getElementById('floatbox');  
in original html it's not an id, it's a class.
So i have to rewrite like so:
$divElement = $dom->getElementByClass('floatbox');
to get this code:

Code: Select all

  1. <?php 
   2.  
   3. $dom = new DOMDocument(); 
   4. @$dom->loadHTMLFile('http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=8880'); 
   5. $divElement = $dom->getElementByClass('floatbox'); 
   6.  
   7. $innerHTML= ''; 
   8. $children = $divElement->childNodes; 
   9. foreach ($children as $child) { 
  10. $innerHTML = $child->ownerDocument->saveXML( $child ); 
  11.  
  12. $doc = new DOMDocument(); 
  13. $doc->loadHTML($innerHTML); 
  14. //$divElementNew = $dom->getElementsByTagName('td'); 
  15. $divElementNew = $dom->getElementsByTagname('td'); 
  16.  
  17.   /*** the array to return ***/ 
  18.   $out = array(); 
  19.   foreach ($divElementNew as $item) 
  20.   { 
  21.     /*** add node value to the out array ***/ 
  22.     $out[] = $item->nodeValue; 
  23.   } 
  24.  
  25. echo '<pre>'; 
  26. print_r($out); 
  27. echo '</pre>'; 
  28.  
  29. }  
i will try this out and come back and report all my findings!

Many many thanks for the hints.

greetings
db1

Re: DOMDocument parsing: search for an very easy to find ele

Posted: Sun Dec 26, 2010 3:11 am
by cpetercarter
The DOMDocument object does not have a method getElementByClass(). If you only have one 'floatbox' div on your page, why not change it to:

Code: Select all

<div id='floatbox'>
Then your original code should work.

Re: DOMDocument parsing: search for an very easy to find ele

Posted: Sun Dec 26, 2010 6:05 am
by lin
hello dear cpetercarter, many thanks for the reply!

cpetercarter wrote:The DOMDocument object does not have a method getElementByClass(). If you only have one 'floatbox' div on your page, why not change it to:

Code: Select all

<div id='floatbox'>
Then your original code should work.
okay - i make the changes to the following!:

i change in line five to get the following new line:

Code: Select all

  $divElement = $dom->getElementById('floatbox');

to get the following:

Code: Select all

 1. <?php
   2.  
   3. $dom = new DOMDocument();
   4. @$dom->loadHTMLFile('http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=8880');
   5. $divElement = $dom->getElementById('floatbox');
   6.  
   7. $innerHTML= '';
   8. $children = $divElement->childNodes;
   9. foreach ($children as $child) {
  10. $innerHTML = $child->ownerDocument->saveXML( $child );
  11.  
  12. $doc = new DOMDocument();
  13. $doc->loadHTML($innerHTML);
  14. //$divElementNew = $dom->getElementsByTagName('td');
  15. $divElementNew = $dom->getElementsByTagname('td');
  16.  
  17.   /*** the array to return ***/
  18.   $out = array();
  19.   foreach ($divElementNew as $item)
  20.   {
  21.     /*** add node value to the out array ***/
  22.     $out[] = $item->nodeValue;
  23.   }
  24.  
  25. echo '<pre>';
  26. print_r($out);
  27. echo '</pre>';
  28.  
  29. }  
 
see: http://dms-schule.bildung.hessen.de/suc ... chool=8880

Since the target has only one floatbox i can do this...

i try out the code later to day... and report here all my findings...

greetings...
dilbertone