Page 1 of 1

PHP and Internet Search Crawlers/engines

Posted: Tue Apr 20, 2004 8:53 am
by jadformosa
OK
I am looking to build a specific Internet search site and I what to use PHP. I need scripts/code/packages that will crawl or spider to a list of predetermined sites, pull keywords/metatag/titles(and such) from these site and write the parsed date to mySQL (with indexing) and then the user will search my site that is specific to my searches.
I have been search for package but I am not finding what I am looking for. I have found a package that is called Harvest that is close.
Does anyone have any ideas on the best solution. I do not want a meta search engine nor do I need a small site search engine.
Any info would be appreciated!

Thanks,

Posted: Wed Apr 21, 2004 3:31 pm
by kettle_drum
You could easily make one for yourself. Just get a database to hold urls to crawl, then have your bot connect to that site - you can do it with fopen(). Then you can parse the page to get what you want - meta tags, text from the page etc. Then store these details in the search engines database.

You can of course then make things as hi-tech as you like - get the bot to collect all links from a page so it will traverse the web looking for more links, have it record how many other pages link another page, etc.

Posted: Wed Apr 21, 2004 7:48 pm
by Buddha443556
Depending on what your doing you might want to consider using another language other than PHP. Perl, Java or C maybe?

Posted: Thu Apr 22, 2004 3:16 am
by timvw
having a look at tools like htdig, mnogosearch, lint might be usefull