Page 1 of 1

how to get a subset of an html string/file

Posted: Thu Mar 29, 2012 9:03 pm
by kc11
Hi,

I have an HTML string/page stored in variable $html. I would like to get part of the html of this page which has a body with class=foo and multiple divs including div[id=bar]. So I am trying to get :

Code: Select all

$subset=$html->body[class=foo]->div[id=bar];
How can I do this?

Thank you in advance,

KC

Re: how to get a subset of an html string/file

Posted: Fri Mar 30, 2012 6:54 am
by maneetpuri
Hi,

You can do this using regular expressions, but what is the purpose.

Cheers,

~M

Re: how to get a subset of an html string/file

Posted: Fri Mar 30, 2012 6:56 am
by social_experiment
maneetpuri wrote: but what is the purpose.
To use the information would be a good guess;

Re: how to get a subset of an html string/file

Posted: Fri Mar 30, 2012 8:14 am
by Tiancris
May be DOM could help?

http://ar.php.net/manual/en/class.domdocument.php

Code: Select all

$html = "<html><body>Test<br></body></html>";
$doc = new DOMDocument();
$doc->loadHTML($html);
And you have functions like:

Code: Select all

$doc->getElementById()
$doc->getElementsByTagName()

Re: how to get a subset of an html string/file

Posted: Fri Mar 30, 2012 9:07 am
by kc11
Thanks for looking at this guys,

So far I have:

Code: Select all


 $dom = new DomDocument();
  @$dom->loadHTML($html);

      // CREATE XPATH OBJECT
      
      $xpath = new DOMXPath($dom);
      
      

    foreach ($xpath->evaluate("//div[@id='body']//*/text()") as $elt) {
        var_dump($elt);
        $body=(trim($elt->wholeText)); 
        
    }
    
    echo 'body '.$body;

The problem with this ( and also with getElementById, getElementsByTagName ) is that it returns a domnodelist

I just want to get the //div[@id='body'] node and its subtree , as html.

KC

Re: how to get a subset of an html string/file

Posted: Fri Mar 30, 2012 11:01 am
by kc11
I think I have it:

I am first getting the node I want with xpath ( you could probably use also use getElementById, getElementsByTagName ) and them importing this into a new domdocument node

Code: Select all


     $dom = new DomDocument();
      @$dom->loadHTML($html);
    
      // CREATE XPATH OBJECT - 
      
      $xpath = new DOMXPath($dom);
      
    
      // MAKES A DOMNODE NOT A DOMNODELIST, because you have to import a domnode!
      
    $body=$xpath->evaluate("//div[@id='body']")->item(0) ; 
     
    
    $subtree=new DOMDocument();
    $node2=$subtree->importNode($body, TRUE); // true imports node's subtree not just targeted node

    $subtree->appendChild($node2); //    / And then append it to the "<root>" node
    
    print($subtree->saveHTML());
    

I'd be interested if there is a better way.

Thanks,

KC