How do I parse these HTML out using javascript?

JavaScript and client side scripting.

Moderator: General Moderators

Post Reply
User avatar
wvoyance
Forum Contributor
Posts: 135
Joined: Tue Apr 17, 2012 8:24 pm

How do I parse these HTML out using javascript?

Post by wvoyance »

What follows is parts of a webpage, a bookstore or a library query results.
It contains some book information, i.e. title, author, and publisher.
I would like to get the red marked parts out.

<h1>數位信號處理實務:通訊應用型DSP 不收退</h1><img src="http://static.findbook.tw/image/book/97 ... 9409/large" class="cover" alt="數位信號處理實務:通訊應用型DSP 不收退" /><p>作者:<a href="/search?keyword_type=author&q=%E5%90%B3%E8%B3%A2%E8%B2%A1">吳賢財</a>, 出版社:滄海書局, 出版日期:2001-12-01</p><p>商品條碼:9789572079409, ISBN:9572079409<br/>分類標籤:<a href="/explore/tag/10032">中文書</a>

Is there any elegant way to pare it (like we do for paring JSON ...etc.)?
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: How do I parse these HTML out using javascript?

Post by tr0gd0rr »

If you would like to read the HTML as if it were in the DOM, you can create an element and put the HTML inside:

Code: Select all

var div = document.createElement('div');
div.innerHTML = yourHtml;
Then you can access the child nodes using functions like `div.getElementsByTagName()` or `childNodes`.
The last part you have in red would need a regular expression to grab text between the colon and comma.
Post Reply