Search Engine Design with PHP
Moderator: General Moderators
-
boris007ng
- Forum Newbie
- Posts: 4
- Joined: Fri Jun 23, 2006 8:42 pm
Search Engine Design with PHP
Hello folks,
I an working on a project which requires me to design a search engine in PHP. The search engine would be able to search a several given sites which would be manually specified.
I do not have any idea how i can go about this, i do not know where to start from.
Please, i need a solid advice on how i can achieve the above task.
I an working on a project which requires me to design a search engine in PHP. The search engine would be able to search a several given sites which would be manually specified.
I do not have any idea how i can go about this, i do not know where to start from.
Please, i need a solid advice on how i can achieve the above task.
-
alex.barylski
- DevNet Evangelist
- Posts: 6267
- Joined: Tue Dec 21, 2004 5:00 pm
- Location: Winnipeg
Why did you take the job then???
Search engines...like Google use some pretty advanced search techniques which far surpass trivial search engines...
Here is a tip: For somehting basic...you can simply scan a web site index.php and extract content and links...
Traverse the web site by following links...
Store extracted content in a MySQL database, no need for tags at this level, so strip everything but content...
Once you've built a cache of each web site and stored it's content in a DB use the databases built in FULLTEXT search capabilities...
When someone queries your database now, you can return which page matches which keyword based on your FULLTEXT results...
Cheers
Search engines...like Google use some pretty advanced search techniques which far surpass trivial search engines...
Here is a tip: For somehting basic...you can simply scan a web site index.php and extract content and links...
Traverse the web site by following links...
Store extracted content in a MySQL database, no need for tags at this level, so strip everything but content...
Once you've built a cache of each web site and stored it's content in a DB use the databases built in FULLTEXT search capabilities...
When someone queries your database now, you can return which page matches which keyword based on your FULLTEXT results...
Cheers
-
boris007ng
- Forum Newbie
- Posts: 4
- Joined: Fri Jun 23, 2006 8:42 pm
Thanks very much for your prompt response.
It wasn't initially part of the project. It was suggested along the line and i agreed to give it a try.
I made a lot of sense out of your suggestion. You have broken down the entire problem to something understandable by me. But i would still need you to please explain more on some areas.
It wasn't initially part of the project. It was suggested along the line and i agreed to give it a try.
I made a lot of sense out of your suggestion. You have broken down the entire problem to something understandable by me. But i would still need you to please explain more on some areas.
Thank you very much for your time and advice.Hockey wrote:
Here is a tip: For somehting basic...you can simply scan a web site index.php and extract content and links...
Traverse the web site by following links...
(Do you mean the whole text content of the site, please explain) Store extracted content in a MySQL database, no need for tags at this level, so strip everything but content...
(Pls explain more, especially FULLTEXT search capabilities) Once you've built a cache of each web site and stored it's content in a DB use the databases built in FULLTEXT search capabilities...
When someone queries your database now, you can return which page matches which keyword based on your FULLTEXT results...
Cheers
-
alex.barylski
- DevNet Evangelist
- Posts: 6267
- Joined: Tue Dec 21, 2004 5:00 pm
- Location: Winnipeg
By cache, I mean...your scripts need to spider/scan registered web sites and extract content and links (links so you can find other pages)...
Extracting content is a rather intensive process...you will need to strip HTML and possibly words like (the, and, or, but, etc)...
When you've done this, you will store the web page content in a database for faster lookup...
Re-scanning/spidering a web page each time someone searches would be over kill...MySQL is better at searching FULLTEXT fields than your PHP is at spidering/scanning, etc...
Look into http://www.phpdig.net/
Cheers
Extracting content is a rather intensive process...you will need to strip HTML and possibly words like (the, and, or, but, etc)...
When you've done this, you will store the web page content in a database for faster lookup...
Re-scanning/spidering a web page each time someone searches would be over kill...MySQL is better at searching FULLTEXT fields than your PHP is at spidering/scanning, etc...
Look into http://www.phpdig.net/
Cheers
-
boris007ng
- Forum Newbie
- Posts: 4
- Joined: Fri Jun 23, 2006 8:42 pm
Is FULLTEXT a kind of data type in MySql. I use navicat with mysql. It is a GUI tool for mysql. I tried looking for FULLTEXT datatype and couldn't find it, but i found LONGTEXT which i think is similar to FULLTEXT.
Your response is really helping me a lot. Thanks once again.
-
alex.barylski
- DevNet Evangelist
- Posts: 6267
- Joined: Tue Dec 21, 2004 5:00 pm
- Location: Winnipeg
-
boris007ng
- Forum Newbie
- Posts: 4
- Joined: Fri Jun 23, 2006 8:42 pm
Re: Search Engine Design with PHP
boris007ng wrote:Hello folks,
I an working on a project which requires me to design a search engine in PHP. The search engine would be able to search a several given sites which would be manually specified.
I do not have any idea how i can go about this, i do not know where to start from.
Please, i need a solid advice on how i can achieve the above task.
Here is something that will help you, i'm sure you'll be able to adapt these:
http://www.phpbuilder.com/columns/dhar2 ... p3?aid=661
http://www.phpbuilder.com/columns/dhar2 ... p3?aid=665
http://www.phpbuilder.com/columns/clay1 ... hp3?aid=51