Url spider

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
wizzard
Forum Commoner
Posts: 93
Joined: Thu May 16, 2002 5:36 am
Location: Belgium
Contact:

Url spider

Post by wizzard »

Hello,

I have made a little exchange script for my website and i work with a spider to check the website from the exchanging members. If my link is not on their link page it disabled the link.

Now i have a problem when someone is putting a link on his site but the link is placed on page 2,3,... then my spider is not finding the link.

Is there a solution for this?
foobar
Forum Regular
Posts: 613
Joined: Wed Sep 28, 2005 10:08 am

Re: Url spider

Post by foobar »

wizzard wrote: Is there a solution for this?
Follow URLs that point to the same site until you find the link. If you haven't, you don't link back.
wizzard
Forum Commoner
Posts: 93
Joined: Thu May 16, 2002 5:36 am
Location: Belgium
Contact:

Post by wizzard »

Well the problem is it the links on some partner sites are limited by 20 per page and my link is then on the second, ... page

Is it possible to let my spider auto browse thru the pages?
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Post by Burrito »

if it's a "true" spider...it should crawl all of the pages.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Url spider

Post by Weirdan »

wizzard wrote: I have made a little exchange script for my website and i work with a spider to check the website from the exchanging members. If my link is not on their link page it disabled the link.
I would approach this task in a bit different way: register visitors from the sites exchanging links with you. When there's no single visitor for a week (or so) disable the link to that site. Once a visitor has arrived from the site, turn the link back on.

This approach eliminates the need for the spider altogether and keeps you 'safe' from abuses like putting the link to your site in invisible div.
Post Reply