Page 1 of 1
Search Engine Design with PHP
Posted: Fri Jun 23, 2006 9:08 pm
by boris007ng
Hello folks,
I an working on a project which requires me to design a search engine in PHP. The search engine would be able to search a several given sites which would be manually specified.
I do not have any idea how i can go about this, i do not know where to start from.
Please, i need a solid advice on how i can achieve the above task.
Posted: Fri Jun 23, 2006 9:12 pm
by alex.barylski
Why did you take the job then???
Search engines...like Google use some pretty advanced search techniques which far surpass trivial search engines...
Here is a tip: For somehting basic...you can simply scan a web site index.php and extract content and links...
Traverse the web site by following links...
Store extracted content in a MySQL database, no need for tags at this level, so strip everything but content...
Once you've built a cache of each web site and stored it's content in a DB use the databases built in FULLTEXT search capabilities...
When someone queries your database now, you can return which page matches which keyword based on your FULLTEXT results...
Cheers

Posted: Fri Jun 23, 2006 9:39 pm
by boris007ng
Thanks very much for your prompt response.
It wasn't initially part of the project. It was suggested along the line and i agreed to give it a try.
I made a lot of sense out of your suggestion. You have broken down the entire problem to something understandable by me. But i would still need you to please explain more on some areas.
Hockey wrote:
Here is a tip: For somehting basic...you can simply scan a web site index.php and extract content and links...
Traverse the web site by following links...
(Do you mean the whole text content of the site, please explain) Store extracted content in a MySQL database, no need for tags at this level, so strip everything but content...
(Pls explain more, especially FULLTEXT search capabilities)
Once you've built a cache of each web site and stored it's content in a DB use the databases built in FULLTEXT search capabilities...
When someone queries your database now, you can return which page matches which keyword based on your FULLTEXT results...
Cheers

Thank you very much for your time and advice.
Posted: Fri Jun 23, 2006 10:31 pm
by alex.barylski
By cache, I mean...your scripts need to spider/scan registered web sites and extract content and links (links so you can find other pages)...
Extracting content is a rather intensive process...you will need to strip HTML and possibly words like (the, and, or, but, etc)...
When you've done this, you will store the web page content in a database for faster lookup...
Re-scanning/spidering a web page each time someone searches would be over kill...MySQL is better at searching FULLTEXT fields than your PHP is at spidering/scanning, etc...
Look into
http://www.phpdig.net/
Cheers

Posted: Sat Jun 24, 2006 7:58 am
by boris007ng

Hockey you are truly a guru. Thanks for sheding light into the dark for me.
Is FULLTEXT a kind of data type in MySql. I use navicat with mysql. It is a GUI tool for mysql. I tried looking for FULLTEXT datatype and couldn't find it, but i found LONGTEXT which i think is similar to FULLTEXT.
Your response is really helping me a lot. Thanks once again.
Posted: Sat Jun 24, 2006 10:31 am
by alex.barylski
Posted: Sat Jun 24, 2006 7:41 pm
by boris007ng
thanks for the response. I will take a look at the site.
How can i create a database for fulltext search and what will be the datatype of the field.
Posted: Sat Jun 24, 2006 7:51 pm
by feyd
read on and you shall learn of what you ask.
Re: Search Engine Design with PHP
Posted: Wed Jul 05, 2006 4:28 am
by nezza
boris007ng wrote:Hello folks,
I an working on a project which requires me to design a search engine in PHP. The search engine would be able to search a several given sites which would be manually specified.
I do not have any idea how i can go about this, i do not know where to start from.
Please, i need a solid advice on how i can achieve the above task.
Here is something that will help you, i'm sure you'll be able to adapt these:
http://www.phpbuilder.com/columns/dhar2 ... p3?aid=661
http://www.phpbuilder.com/columns/dhar2 ... p3?aid=665
http://www.phpbuilder.com/columns/clay1 ... hp3?aid=51