help with web scraper
Moderator: General Moderators
-
playwright
- Forum Newbie
- Posts: 20
- Joined: Wed Jun 02, 2010 6:11 pm
help with web scraper
Hello..i'm new to php so i need some real help in here...
I trying to create a web scraper that grabs a forum's content and shows only the posts. . The source code is here:
<html>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<?php
$html = file_get_contents ('http://www.......');
$dom = new DomDocument();
@$dom->loadHTML ($html);
$xpath = new DOMXPath ($dom);
$key = $xpath->query ('//*[@class="postTextContainer"]');
foreach($key as $keys){
echo $keys->nodeValue ,"<br/> \n";
}
?>
</html>
can anyone tell me how i could grab all the posts that are in the same thread??now i can only grab the posts that are in the above url..i think it's called multiple page scraping??
I trying to create a web scraper that grabs a forum's content and shows only the posts. . The source code is here:
<html>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<?php
$html = file_get_contents ('http://www.......');
$dom = new DomDocument();
@$dom->loadHTML ($html);
$xpath = new DOMXPath ($dom);
$key = $xpath->query ('//*[@class="postTextContainer"]');
foreach($key as $keys){
echo $keys->nodeValue ,"<br/> \n";
}
?>
</html>
can anyone tell me how i could grab all the posts that are in the same thread??now i can only grab the posts that are in the above url..i think it's called multiple page scraping??
- phdatabase
- Forum Commoner
- Posts: 83
- Joined: Fri May 28, 2010 10:02 am
- Location: Fort Myers, FL
Re: help with web scraper
Scraping is usually accomplished using (x)html markup because you'll never see the source code and essentially reverse engineering the process that created the content. The snippet of code appears to load some html content which is where you start. As far as how to get threads, every forum is different and you'll just need to puzzle it out. (That's the fun of it)
-
playwright
- Forum Newbie
- Posts: 20
- Joined: Wed Jun 02, 2010 6:11 pm
Re: help with web scraper
in the situation i'm trying on right now, the url of the first page of the thread is like http://www.(bla bla bla).com/forum/showthread.php?t=360717
and the other pages of the thread are http://www.(bla bla bla).com/forum/showthread.php?t=360717&page=2 ... &page=3 and so on... should i use a regex and a for loop or sth like that???
and the other pages of the thread are http://www.(bla bla bla).com/forum/showthread.php?t=360717&page=2 ... &page=3 and so on... should i use a regex and a for loop or sth like that???
- phdatabase
- Forum Commoner
- Posts: 83
- Joined: Fri May 28, 2010 10:02 am
- Location: Fort Myers, FL
Re: help with web scraper
You need to load the 'http://blah/blah/blah...' into a string and parse it for the content and then continue that for as many ages as there are. So, yes you will need a loop but this is more a structure thing than a regex thing and I find while loops are generally handier for this type work.
It appears that following a thread will be easy based on the query strings you are showing.
It appears that following a thread will be easy based on the query strings you are showing.
-
playwright
- Forum Newbie
- Posts: 20
- Joined: Wed Jun 02, 2010 6:11 pm
Re: help with web scraper
i hope i' ll find a way to do it..I also want to ask how i can delete the content that exists between two tags and exists in the content that i have grabbed with the above code?? more specific the tag is <div class="........">bla bla</div>
- phdatabase
- Forum Commoner
- Posts: 83
- Joined: Fri May 28, 2010 10:02 am
- Location: Fort Myers, FL
Re: help with web scraper
Use PHP's header to create a HTTP GET request and load the reply. Or, use cURL, easier yet. An excellent primer for building agents is Webbots, Spiders, and Scrapers a guide to developing internet agents with PHP/cURL by Michael Schrenk
-
playwright
- Forum Newbie
- Posts: 20
- Joined: Wed Jun 02, 2010 6:11 pm
Re: help with web scraper
thanks for the advice..Actually, i have searched all over the web to write these down,i have searched curl, dom, regexes but as i said before i'm new to php so it would be really helpful if you could write some code for these.. thanks anyway!!!
-
playwright
- Forum Newbie
- Posts: 20
- Joined: Wed Jun 02, 2010 6:11 pm
Re: help with web scraper
any help???
- phdatabase
- Forum Commoner
- Posts: 83
- Joined: Fri May 28, 2010 10:02 am
- Location: Fort Myers, FL
Re: help with web scraper
I make my living writing agents. I am happy to share my knowledge and experience (what there is of it) but I am not going to write a scraper for you or anyone else; unless you want to pay me, of course. If you get the resource I named and apply yourself, you should be able to write a scraper in a week.