Search Engine Design with PHP

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
boris007ng
Forum Newbie
Posts: 4
Joined: Fri Jun 23, 2006 8:42 pm

Search Engine Design with PHP

Post by boris007ng »

Hello folks,

I an working on a project which requires me to design a search engine in PHP. The search engine would be able to search a several given sites which would be manually specified.

I do not have any idea how i can go about this, i do not know where to start from.

Please, i need a solid advice on how i can achieve the above task.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

Why did you take the job then??? :lol:

Search engines...like Google use some pretty advanced search techniques which far surpass trivial search engines...

Here is a tip: For somehting basic...you can simply scan a web site index.php and extract content and links...

Traverse the web site by following links...

Store extracted content in a MySQL database, no need for tags at this level, so strip everything but content...

Once you've built a cache of each web site and stored it's content in a DB use the databases built in FULLTEXT search capabilities...

When someone queries your database now, you can return which page matches which keyword based on your FULLTEXT results...

Cheers :)
boris007ng
Forum Newbie
Posts: 4
Joined: Fri Jun 23, 2006 8:42 pm

Post by boris007ng »

Thanks very much for your prompt response.

It wasn't initially part of the project. It was suggested along the line and i agreed to give it a try.

I made a lot of sense out of your suggestion. You have broken down the entire problem to something understandable by me. But i would still need you to please explain more on some areas.
Hockey wrote:
Here is a tip: For somehting basic...you can simply scan a web site index.php and extract content and links...

Traverse the web site by following links...

(Do you mean the whole text content of the site, please explain) Store extracted content in a MySQL database, no need for tags at this level, so strip everything but content...

(Pls explain more, especially FULLTEXT search capabilities) Once you've built a cache of each web site and stored it's content in a DB use the databases built in FULLTEXT search capabilities...

When someone queries your database now, you can return which page matches which keyword based on your FULLTEXT results...

Cheers :)
Thank you very much for your time and advice.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

By cache, I mean...your scripts need to spider/scan registered web sites and extract content and links (links so you can find other pages)...

Extracting content is a rather intensive process...you will need to strip HTML and possibly words like (the, and, or, but, etc)...

When you've done this, you will store the web page content in a database for faster lookup...

Re-scanning/spidering a web page each time someone searches would be over kill...MySQL is better at searching FULLTEXT fields than your PHP is at spidering/scanning, etc...

Look into http://www.phpdig.net/

Cheers :)
boris007ng
Forum Newbie
Posts: 4
Joined: Fri Jun 23, 2006 8:42 pm

Post by boris007ng »

:D Hockey you are truly a guru. Thanks for sheding light into the dark for me.

Is FULLTEXT a kind of data type in MySql. I use navicat with mysql. It is a GUI tool for mysql. I tried looking for FULLTEXT datatype and couldn't find it, but i found LONGTEXT which i think is similar to FULLTEXT.

Your response is really helping me a lot. Thanks once again.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

boris007ng
Forum Newbie
Posts: 4
Joined: Fri Jun 23, 2006 8:42 pm

Post by boris007ng »

thanks for the response. I will take a look at the site.

How can i create a database for fulltext search and what will be the datatype of the field.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

read on and you shall learn of what you ask.
nezza
Forum Newbie
Posts: 3
Joined: Wed Jul 05, 2006 3:46 am

Re: Search Engine Design with PHP

Post by nezza »

boris007ng wrote:Hello folks,

I an working on a project which requires me to design a search engine in PHP. The search engine would be able to search a several given sites which would be manually specified.

I do not have any idea how i can go about this, i do not know where to start from.

Please, i need a solid advice on how i can achieve the above task.

Here is something that will help you, i'm sure you'll be able to adapt these:

http://www.phpbuilder.com/columns/dhar2 ... p3?aid=661
http://www.phpbuilder.com/columns/dhar2 ... p3?aid=665
http://www.phpbuilder.com/columns/clay1 ... hp3?aid=51
Post Reply