how to get a subset of an html string/file

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
kc11
Forum Commoner
Posts: 73
Joined: Mon Sep 27, 2010 3:26 pm

how to get a subset of an html string/file

Post by kc11 »

Hi,

I have an HTML string/page stored in variable $html. I would like to get part of the html of this page which has a body with class=foo and multiple divs including div[id=bar]. So I am trying to get :

Code: Select all

$subset=$html->body[class=foo]->div[id=bar];
How can I do this?

Thank you in advance,

KC
maneetpuri
Forum Commoner
Posts: 60
Joined: Tue Oct 07, 2008 6:32 am

Re: how to get a subset of an html string/file

Post by maneetpuri »

Hi,

You can do this using regular expressions, but what is the purpose.

Cheers,

~M
User avatar
social_experiment
DevNet Master
Posts: 2793
Joined: Sun Feb 15, 2009 11:08 am
Location: .za

Re: how to get a subset of an html string/file

Post by social_experiment »

maneetpuri wrote: but what is the purpose.
To use the information would be a good guess;
“Don’t worry if it doesn’t work right. If everything did, you’d be out of a job.” - Mosher’s Law of Software Engineering
User avatar
Tiancris
Forum Commoner
Posts: 39
Joined: Sun Jan 08, 2012 9:54 pm
Location: Mar del Plata, Argentina

Re: how to get a subset of an html string/file

Post by Tiancris »

May be DOM could help?

http://ar.php.net/manual/en/class.domdocument.php

Code: Select all

$html = "<html><body>Test<br></body></html>";
$doc = new DOMDocument();
$doc->loadHTML($html);
And you have functions like:

Code: Select all

$doc->getElementById()
$doc->getElementsByTagName()
kc11
Forum Commoner
Posts: 73
Joined: Mon Sep 27, 2010 3:26 pm

Re: how to get a subset of an html string/file

Post by kc11 »

Thanks for looking at this guys,

So far I have:

Code: Select all


 $dom = new DomDocument();
  @$dom->loadHTML($html);

      // CREATE XPATH OBJECT
      
      $xpath = new DOMXPath($dom);
      
      

    foreach ($xpath->evaluate("//div[@id='body']//*/text()") as $elt) {
        var_dump($elt);
        $body=(trim($elt->wholeText)); 
        
    }
    
    echo 'body '.$body;

The problem with this ( and also with getElementById, getElementsByTagName ) is that it returns a domnodelist

I just want to get the //div[@id='body'] node and its subtree , as html.

KC
kc11
Forum Commoner
Posts: 73
Joined: Mon Sep 27, 2010 3:26 pm

Re: how to get a subset of an html string/file

Post by kc11 »

I think I have it:

I am first getting the node I want with xpath ( you could probably use also use getElementById, getElementsByTagName ) and them importing this into a new domdocument node

Code: Select all


     $dom = new DomDocument();
      @$dom->loadHTML($html);
    
      // CREATE XPATH OBJECT - 
      
      $xpath = new DOMXPath($dom);
      
    
      // MAKES A DOMNODE NOT A DOMNODELIST, because you have to import a domnode!
      
    $body=$xpath->evaluate("//div[@id='body']")->item(0) ; 
     
    
    $subtree=new DOMDocument();
    $node2=$subtree->importNode($body, TRUE); // true imports node's subtree not just targeted node

    $subtree->appendChild($node2); //    / And then append it to the "<root>" node
    
    print($subtree->saveHTML());
    

I'd be interested if there is a better way.

Thanks,

KC
Post Reply