Page 1 of 1

Fulltext Search

Posted: Tue Oct 17, 2006 10:29 am
by dibyendrah
Dear all,
I'm currenly working on fulltext search in which I couldn't do a exact match. For example, if I would like to search for exact match for keyword 'mark', it will match 'market', 'marked', 'marks'.......and 'mark' too.

My sql query looks like below :

Code: Select all

if($search_type=='exact_keywords'){
		$search_string = "' \"".$search_string."\" '";
	}elseif($search_type=='all_keywords'){
		$tmp_array = explode(" ", $search_string);
		$search_string = "";
		foreach ($tmp_array as $key=>$val){
			$search_string .= " +(".$val.")";
		}
		$search_string = "' ".$search_string." '";
	}
	
	$sql =  "SELECT page_id, url_scheme, domain, pdf_full_path, pdf_full_text,";
	$sql .= " MATCH(pdf_full_text) AGAINST ($search_string IN BOOLEAN MODE) AS relevance ";
	$sql .=" FROM tbl_pdf_fulltext ";
	$sql .= " WHERE page_id = ".$page_id;
	$sql .= " ORDER BY relevance DESC limit 0, 10\n";
Any help will be appreciated. Thank you.


With Best Regards,
Dibyendra

Posted: Fri Nov 03, 2006 1:49 am
by dibyendrah
Dear all,
I have a table in MyISAM engine with datatype longtext which I have given a fulltext index. I tried to use the MATCH(pdf_full_text) AGAINST ('query' IN BOOLEAN MODE) but it's too slow. I have read a manual and it says, fulltext index can be applied to char, varchar and text datatype but the data to be stored in that field is too large that it cannot fit in text itself. Is there any way to retrieve data faster rather than using boolean mode ?

code snippets that I have made is as follows :

Code: Select all

if($search_type==1){ //search type 1 is exact search
	$search_string = "'+\"".$search_string."\"'";
}elseif($search_type==0){  //search type 1 is general search to match all of the keywords if possible
	$tmp_array = explode(" ", $search_string);
	$search_string = "";
        foreach ($tmp_array as $key=>$val){
		$search_string .= "+(".$val.")";
	}
	$search_string = "'".$search_string." '";
}
	
$sql =  "SELECT pdf_id, page_id, url_scheme, domain, pdf_full_path, pdf_full_text,";
$sql .= " MATCH(pdf_full_text) AGAINST ($search_string IN BOOLEAN MODE) AS relevance ";
$sql .= " AS relevance ";
$sql .=" FROM tbl_pdf_fulltext ";
$sql .= " WHERE ";
$sql .= " MATCH(pdf_full_text) AGAINST ('".$search_string."')";
	
$sql .= " ORDER BY relevance DESC ";

if(mysql_num_rows($recs_matched)>0){
	return($recs_matched);
}else{
	return null;
}
Please suggest. Thank you.

With Best Regards,
Dibyendra

Posted: Fri Nov 03, 2006 4:25 am
by dibyendrah
Dear all,
It is just my idea and I don't know if it can be implemented or not. Suppose I search the keyword 'jack hammer', it would first display the links having jack hammer and later display the links having either having 'jack' or 'hammer'. I know it can be done using fulltext search but I'm having trouble to load faster using that. So, I thought to use the standara select.....like'%jack% OR ...like '%hammer%' . It will display in random order. So, it would be so kind of you all if you could suggest the way that I need to choose. I have already posted the code that I've made in this thread before.

Thank you all.

With best regards,
Dibyendra

Posted: Fri Nov 03, 2006 10:10 am
by pickle
Your query in your second post is a little weird. You've got the string 'AS relevance' in there twice.

Generally, fulltext searching will be faster than doing '%jack%' type queries - fulltext searching uses indices which greatly speeds it up. How many rows are in your `tbl_pdf_fulltext`table & how long are the queries taking?

Also, have you ever heard of "heredoc" notation? It's very convenient if you want to store/echo a string that is multiple lines. I'd use it rather than using multiple lines to build your query:

Code: Select all

$sql = <<<SQL
SELECT
   pdf_id,
   page_id,
   url_scheme,
   domain,
   pdf_full_path,
   pdf_full_text,
   MATCH(pdf_full_text) AGAINST ('$search_string' IN BOOLEAN MODE) AS relevance
FROM
   tbl_pdf_fulltext
WHERE
   MATCH(pdf_full_text) AGAINST('$search_string')
ORDER BY
   relevance DESC
SQL;
That looks much neater in my opinion. :)


I'm not entirely sure what you're asking in your 3rd post. Do you want to be able to search for 'jack hammer' and have all entries with 'jack hammer', just 'jack', and just 'hammer' returned - making sure entries with 'jack hammer' show up at the top? I'd really try to stay with fullltext searching as it will almost certainly be the best choice.

Posted: Mon Nov 06, 2006 12:45 am
by dibyendrah
The enhanced function that I made for different searching modes is as follows :

Hope someone can look at this and modify this for better performance and post :

Code: Select all

function getRecords($search_string, $search_type, $db_connect, $page_id= null, $display_limit = NULL, $offset= NULL, $filter=NULL, $mode='boolean_search'){

	//if($mode=='standard_fulltext_boolean_search'){
	switch ($mode){
		
	case 'boolean_fulltext_search':
		
		if($search_type==1){
			$search_string = "'+\"".$search_string."\"'";
		}elseif($search_type==0){
			$tmp_array = explode(" ", $search_string);
			$search_string = "";
			foreach ($tmp_array as $key=>$val){
				$search_string .= "+(".$val.")";
			}
			$search_string = "'".$search_string." '";
		}

		$sql =  "SELECT pdf_id, page_id, url_scheme, domain, pdf_full_path, pdf_full_text,";
		$sql .= " MATCH(pdf_full_text) AGAINST ($search_string IN BOOLEAN MODE) AS relevance ";
		$sql .=" FROM tbl_pdf_fulltext ";
		$sql .= " WHERE ";
		$sql .= " MATCH(pdf_full_text) AGAINST ($search_string IN BOOLEAN MODE)";
		$sql .= " AND ";
		$sql .= " page_id = 4";
		
		break;

	//}elseif ($mode=='standard_fulltext_index_search'){
	
	case 'standard_fulltext_search':
		
		$search_string = "'".$search_string."'";

		$sql =  "SELECT pdf_id, page_id, url_scheme, domain, pdf_full_path, pdf_full_text,";
		$sql .= " MATCH(pdf_full_text) AGAINST ($search_string) AS relevance ";
		$sql .=" FROM tbl_pdf_fulltext ";
		$sql .= " WHERE ";
		$sql .= " MATCH(pdf_full_text) AGAINST ($search_string) ";
		$sql .= " AND ";
		$sql .= " page_id = 4";

	//}elseif ($mode=='standard_quick_search'){
		break;
		
	case 'standard_search':
		
		$match = "";

		if($search_type==1){
			$match = "pdf_full_text like '%".$search_string."%'";
		}elseif($search_type==0){
			$tmp_array = explode(" ", $search_string);

			$match = " pdf_full_text like '%".$search_string."%' AND";

			for ($x=0; $x< (count($tmp_array)-1); $x++){
				$match .= " pdf_full_text like '%".$tmp_array[$x]."%' OR ";
			}

			$match .= " pdf_full_text like '%".$tmp_array[$x]."%' ";
		}

		$sql =  "SELECT pdf_id, page_id, url_scheme, domain, pdf_full_path, pdf_full_text";
		$sql .=  " FROM tbl_pdf_fulltext ";
		$sql .= " WHERE ";
		$sql .= $match;
		break ;
		
	
	default:
			//print "Error in choosing search mode ! Please restart your browser and try again. This happens when the session expires!";
			//header("location:search_result.php?reset=1");
			exit("Unknown Search Mode!");
	}//end switch

	if(!is_null($filter)){
		$sql .= " AND pdf_id IN (";
		$sql .= $filter;
		$sql .=")";
	}
	if($mode!='standard_search'){
		$sql .= " ORDER BY relevance DESC ";
	}
	
	if(!is_null($offset) && !is_null($display_limit)){
		$sql .= " limit ".$offset.", ".$display_limit;
	}

	//print "<BR><EM>".$sql."</EM><BR><BR>";

	$recs_matched = mysql_query($sql, $db_connect) or die(mysql_error());
	$GLOBALS['nTotalRecs'] = mysql_num_rows($recs_matched);
	
	if(mysql_num_rows($recs_matched)>0){
		return($recs_matched);
	}else{
		return null;
	}

}
Thank you .

With Best Regards,
Dibyendra

Posted: Mon Nov 06, 2006 4:33 am
by dibyendrah
Dear all,
As I have used the longtext data type as fulltext index, performance is not so good as I wanted. Has anybody used the longtext as fulltext and optimized the performance ? I was also wondering how to use the SQL caching as well. I have read the manual but I'm still not clear how it operates and what do we have to do it after we configure ?
Please share the knowledge.

With Best Regards,
Dibyendra

Posted: Tue Nov 07, 2006 4:37 am
by dibyendrah
Has anybody implemented a boolean search from query like 'Sam AND Town NOT Village' so that we can search like

Code: Select all

MATCH(tbl_pdf_fulltext) AGAINST (+'Sam' +'Town' -'Village' IN BOOLEAN MODE)
Or other proper syntax in MySQL.

If we can parse the search query and make the SQL like above, it would be much more easier to search the full text. Please share your knowledge if anybody has already implemented that. I'm not getting enough time to implement this due to the time limitation.

Thank you .

With Best Regards,
Dibyendra

Posted: Tue Nov 07, 2006 5:42 am
by dibyendrah
I've uploaded the demo on this URL :
http://202.79.37.177/pdfsearch/public/

Plese comment on this project. If someone likes my project, I can provide this project to the interested one.

Thank you,
Dibyendra