Page 1 of 1

jQuery parse an HTML document COMPLETLY

Posted: Tue Jun 01, 2010 11:16 pm
by JellyFish
Hey guys. In jQuery you can do things like:

Code: Select all

$(html).find("#some-element");
and jQuery will create a set of elements and then you can do a bunch of things with them. However, lately I have been frustrated because this:

Code: Select all

$.get("somepage.html", function(html) {
document.title = $(html).find("title").text();
});
doesn't seem to work. I tried logging the html to the console and found that it seems jQuery only creates elements that exist in the body of the html document returned from the get request. I think this might be because jQuery, with large html strings, creates only the top-most html element in the set and then use the DOM's innerHTML property to create the rest.

So, how can I easily create a set of elements from a html string that contains the ENTIRE html document? This way I can search for things in the document returned by a XHR request, such as the title of the document.

Thanks for reading. I appreciate anyones input on this.

Re: jQuery parse an HTML document COMPLETLY

Posted: Wed Jun 02, 2010 12:09 pm
by kaszu
$(html) has all HEAD and BODY elements, so you can use

Code: Select all

$(html).filter('title').text()
Problem is that jQuery creates a DIV and sets innerHTML and then takes DIV children, but since BODY and HEAD elements are not valid DIV childs, then those are not created by browser.
As long as you don't use "html", "head" or "body" in selector, it should work.

Only solution I can come up with is to replace HTML, HEAD and BODY with something like HTMLA, HEADA and BODYA in html string, but that feels ugly:

Code: Select all

//Replace tags
var html = '<html><head><title>aaaa</title></head><body class="qqq">sss</body></html>';
      html = html.replace(/(html|head|body)/ig, '$1a');

//Now html is
//'<htmla><heada><title>aaaa</title></heada><bodya class="qqq">sss</body></htmla>';

//Example
$(html).find('bodya').attr('className');    // => qqq

Re: jQuery parse an HTML document COMPLETLY

Posted: Wed Jun 02, 2010 2:08 pm
by JellyFish
But that doesn't work! I have a whole friggin' document that is supposed to be able to be viewed alone without JavaScript. However, I am using jQuery to Ajaxify my site and dynamically fetch the html content and use only what is necessary from that document. So, I don't want to fud up the document's markup and totally screw myself over. So, how could I get jQuery to behave nicely DO WHAT I WANT IT TOOOOOOOOOOO (without re-writing jQuery)?!

Re: jQuery parse an HTML document COMPLETLY

Posted: Wed Jun 02, 2010 4:16 pm
by kaszu
Let's say you have following page (html) loaded with ajax:

Code: Select all

<!DOCTYPE html>
<html>
<head>
    <link href="/css/style.css" media="screen" rel="Stylesheet" type="text/css"/>
    <title>My page</title>
</head>
<body>
    <div id="wrapper">
        <div class="text">
            <h1>Page title</h1>
            <p>Some text</p>
        </div>
    </div>
</body>
</html>
How to get title and content from loaded html:

Code: Select all

var html = '...';  //html from above

//Since $(html) will return children of head and body,
//add wrapper around html because .find searches children
//of selected nodes
html = $('<div></div>').append(html);

var title = $(html).find('title').text();

//Remove .children() from following is you want <div class="text"> in content
var content = html.find('#wrapper div.text').children().html();

console.log(title);     // => My page
console.log(content);   // => <h1>...</h1><p>...</p>
Is this what you need?

Re: jQuery parse an HTML document COMPLETLY

Posted: Mon Jun 07, 2010 6:33 pm
by JellyFish
Sorry for the absence.

Okay, so creating a div then appending the html document into it will create the entire document? I tried out the code on google.com in my web inspector console and it didn't work. I got the same results; There is no head element. Also I'm not quite sure what you're talking about with the content part.

[edit] In fact, I didn't notice, but I'm also getting an error in the console:

[text]Uncaught TypeError: Cannot call method 'appendChild' of undefined[/text]

Re: jQuery parse an HTML document COMPLETLY

Posted: Tue Jun 08, 2010 11:42 am
by kaszu
Please explain what you are trying to achieve, why you need HEAD of the received document (already posted how to get title of received document)?
Okay, so creating a div then appending the html document into it will create the entire document?
DIV will have all children of HEAD and BODY, but not these two elements (there will be TITLE, META, LINK, SCRIPT, DIV, A, ..., but not BODY and HEAD).

By "content" I mean "a portion of html from BODY".

Re: jQuery parse an HTML document COMPLETLY

Posted: Tue Jun 08, 2010 6:29 pm
by JellyFish
kaszu wrote:Please explain what you are trying to achieve, why you need HEAD of the received document (already posted how to get title of received document)?
What I'm trying to achieve: create a jquery object that contains all elements in an entire HTML document. Why? This way I can XHR for html documents and have my script dynamically react to the various elements (title, meta, body's content, anything).

What you posted on how to get the title, doesn't work. The following is the steps I took to test out your example.
  1. I went to http://google.com.
  2. Sense I'm using Chrome on Windows, I pressed Ctrl+Shift+J to open the web inspector and console.
  3. I clicked a bookmarklet I made which imports jQuery into anypage.
  4. I typed the following into the console (comments are console logs):

    Code: Select all

    var doc;
    // undefined
    $.get("/", function(html) { doc = html });
    // XMLHttpRequest
    $("<div></div>").append(doc);
    // x Uncaught TypeError: Cannot call method 'appendChild' of undefined
    
As you can see from my test, the console is spitting out an error when I try to append the document to a newly created div.
kaszu wrote:DIV will have all children of HEAD and BODY, but not these two elements (there will be TITLE, META, LINK, SCRIPT, DIV, A, ..., but not BODY and HEAD).
Maybe, if I could get it to work.
kaszu wrote:By "content" I mean "a portion of html from BODY".
I assumed as much.

Re: jQuery parse an HTML document COMPLETLY

Posted: Wed Jun 09, 2010 11:59 am
by kaszu
After several WTF minutes, found that problem is in <script>s which are in HTML, they are executed (they already were executed on normal page load) and they throw that error :? . Try this (works for me):

Code: Select all

jQuery.get("/", function(html) { 
    html = html.replace(/<\/?script[^>]*>/g, '');  //Removing <script> tags, because we don't want to execute them
    jQuery("<div></div>").append(html);
});

Re: jQuery parse an HTML document COMPLETLY

Posted: Thu Jun 10, 2010 8:01 pm
by JellyFish
I tried you're code and it now works, there's no more error. However, .find("title") still doesn't find the title element.

Re: jQuery parse an HTML document COMPLETLY

Posted: Fri Jun 11, 2010 11:21 am
by kaszu
Sorry about that, didn't checked on Chrome or IE, that method seems to be working only in FF.

Here, fixed inline <script>, separated head and body (in FF and IE "head" and "body" variables are the same and have all HEAD and BODY children, except in IE where <title> is missing), also cross-browser "title" extraction

Code: Select all

jQuery.get("/", function(html) {
    html = html.replace(/<script[^>]*>((\r|\n|.)*?)<\/script[^>]*>/mg, '');  //Removing <script> tags, because we don't want to execute them

    //Extract <head>
    var html_head = html.match(/<head[^>]*>((\r|\n|.)*)<\/head/m);
    html_head = html_head ? html_head[1] : '';

    var head = jQuery("<head></head>").append(html_head);
    var body = jQuery("<div></div>").append(html);
    var title = '';

    if (!head.children().length) head = body;    //For Firefox

    //IE - for some reason doesn't have <title> element
    //using regular expression to extract it:
    title = html_head.match(/<title[^>]*>((\r|\n|.)*)<\/title/m);
    title = title ? title[1] : '';

    console.log(title);  // => jQuery: The Write Less, Do More, JavaScript Library
    console.log(head.find('link').length);  // => 4

    //Do oher stuff with "head" and "body"
});
Never thought this would be that difficult :roll:

Re: jQuery parse an HTML document COMPLETLY

Posted: Sat Jun 12, 2010 4:08 pm
by JellyFish
Ah thanks! I wish there was a less verbose way to do this, or jquery had something like this built in. It'll do though.