Ideas for Extracting Info from HTML Chat Logs
Posted: Sun Mar 12, 2006 1:10 am
I've got a bunch of adium log files that are formatted like this:
I want to extract all the timestamps fields that reside in a "recieve" div and put them in an array, and do the same for those that reside in a "send" div. The thing is, I'd like to avoid using regular expressions because I don't want to spend the time learning the syntax right now (unless the requisite syntax wouldn't be that hard to learn?). I can use C, Ruby, Perl, or PHP. (Although I am hoping to avoid doing it in C since the easiest way I can think of doing it there is not as easy as I'd like it to be.) Basically, in the end I just want the timestamps sorted into seperate arrays that way I can do things like find the difference between sent messages and the closest recieve message, etc.
Anyone want to offer some input on the matter?
Code: Select all
<div class="send">
<span class="timestamp">6:32:09 PM</span>
<span class="sender">dotkrisennay@hotmail.com: </span>
<pre class="message">hey</pre></div>
<div class="receive">
<span class="timestamp">6:32:19 PM</span>
<span class="sender">rolyoly@hotmail.com: </span>
<pre class="message"><B>hola</B></pre></div>Anyone want to offer some input on the matter?