PHP and Internet Search Crawlers/engines

Ye' old general discussion board. Basically, for everything that isn't covered elsewhere. Come here to shoot the breeze, shoot your mouth off, or whatever suits your fancy.
This forum is not for asking programming related questions.

Moderator: General Moderators

Post Reply
jadformosa
Forum Newbie
Posts: 1
Joined: Tue Apr 20, 2004 8:53 am

PHP and Internet Search Crawlers/engines

Post by jadformosa »

OK
I am looking to build a specific Internet search site and I what to use PHP. I need scripts/code/packages that will crawl or spider to a list of predetermined sites, pull keywords/metatag/titles(and such) from these site and write the parsed date to mySQL (with indexing) and then the user will search my site that is specific to my searches.
I have been search for package but I am not finding what I am looking for. I have found a package that is called Harvest that is close.
Does anyone have any ideas on the best solution. I do not want a meta search engine nor do I need a small site search engine.
Any info would be appreciated!

Thanks,
kettle_drum
DevNet Resident
Posts: 1150
Joined: Sun Jul 20, 2003 9:25 pm
Location: West Yorkshire, England

Post by kettle_drum »

You could easily make one for yourself. Just get a database to hold urls to crawl, then have your bot connect to that site - you can do it with fopen(). Then you can parse the page to get what you want - meta tags, text from the page etc. Then store these details in the search engines database.

You can of course then make things as hi-tech as you like - get the bot to collect all links from a page so it will traverse the web looking for more links, have it record how many other pages link another page, etc.
User avatar
Buddha443556
Forum Regular
Posts: 873
Joined: Fri Mar 19, 2004 1:51 pm

Post by Buddha443556 »

Depending on what your doing you might want to consider using another language other than PHP. Perl, Java or C maybe?
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

having a look at tools like htdig, mnogosearch, lint might be usefull
Post Reply