Page 1 of 1

How do I parse these HTML out using javascript?

Posted: Wed Apr 25, 2012 7:47 pm
by wvoyance
What follows is parts of a webpage, a bookstore or a library query results.
It contains some book information, i.e. title, author, and publisher.
I would like to get the red marked parts out.

<h1>數位信號處理實務:通訊應用型DSP 不收退</h1><img src="http://static.findbook.tw/image/book/97 ... 9409/large" class="cover" alt="數位信號處理實務:通訊應用型DSP 不收退" /><p>作者:<a href="/search?keyword_type=author&q=%E5%90%B3%E8%B3%A2%E8%B2%A1">吳賢財</a>, 出版社:滄海書局, 出版日期:2001-12-01</p><p>商品條碼:9789572079409, ISBN:9572079409<br/>分類標籤:<a href="/explore/tag/10032">中文書</a>

Is there any elegant way to pare it (like we do for paring JSON ...etc.)?

Re: How do I parse these HTML out using javascript?

Posted: Thu Apr 26, 2012 2:02 pm
by tr0gd0rr
If you would like to read the HTML as if it were in the DOM, you can create an element and put the HTML inside:

Code: Select all

var div = document.createElement('div');
div.innerHTML = yourHtml;
Then you can access the child nodes using functions like `div.getElementsByTagName()` or `childNodes`.
The last part you have in red would need a regular expression to grab text between the colon and comma.