Page 1 of 2

similar_text too slow!

Posted: Sat Mar 04, 2006 6:46 pm
by pedroz
I am developing some code experiences and I would like to know if you could help me with the following code:

Code: Select all

$result = $db->sql_query("SELECT postBody FROM posts WHERE status='1'");

while($row = mysql_fetch_array($result)){
similar_text(preg_replace("/\r\n|\n|\r/", "", strtolower($postform)), preg_replace("/\r\n|\n|\r/", "", strtolower($row[0])), $similarity_pst);
if (number_format($similarity_pst, 0) > 90){$flag = "message already sent"; break;}  
}
Target idea:
check if a form post message is already in database or any similar post message(I picked a 90% margin)

The code above works but it slow down my machine (specialy if the post is huge: more 10000chars) and it will take a lot of time if several posts are already in database...

Do you have any idea to develop this in a fast mode ?

Posted: Sat Mar 04, 2006 6:49 pm
by s.dot
what's your reason for doing this? (just curious)

if you're checking to see if a message has already been entered, you should check for an id, or another unique field name

Posted: Sat Mar 04, 2006 7:16 pm
by pedroz
Hi scrotaye

I see your point checking by id and will reduce the query but my objective is check all messages to avoid doble posts...

On other words, do not allow to insert similar text in database post field.

Posted: Sat Mar 04, 2006 7:18 pm
by a94060
if you do not mind,could you use the php tags?(im sorry,im not even mod,but i jus been going thru posts and saying this :oops: )

Posted: Sat Mar 04, 2006 7:21 pm
by pedroz
I tried to post it with php tags but I received an error... Now changed and OK! :)

Posted: Sat Mar 04, 2006 7:24 pm
by a94060
ok thanks.im sorry mods if im tryin to take your job 8O

Posted: Sat Mar 04, 2006 8:32 pm
by Chris Corbyn
a94060 wrote:ok thanks.im sorry mods if im tryin to take your job 8O
:lol: It's cool :)

Posted: Sat Mar 04, 2006 8:42 pm
by a94060
:lol: thanks, i wish i was mod...

Posted: Sat Mar 04, 2006 8:48 pm
by s.dot
d11wtq isn't really a mod........ he just acts like it :lol:

Posted: Sat Mar 04, 2006 8:48 pm
by John Cartwright
I don't think your idea of preventing double posts is a good idea, considering there are several valid reasons for multiple posts to be identical.

Example:
Noob: Is this correct? echo 'foobar'; ??
Jcart: Yes
Noob: What about this? ..
Jcart: Yes
Obviously simplified example, but this seems more like an annoiyance for your users than it is helpful in preventing double posts. Instead what you can try it do not allow them to post within 15 (?) seconds of their previous post, eliminating the possiblity of them clicking the submit button twice.

Posted: Sat Mar 04, 2006 8:50 pm
by a94060
scrotaye wrote:d11wtq isn't really a mod........ he just acts like it :lol:
doesnt d11wtq run the site?

Jcart wrote:I don't think your idea of preventing double posts is a good idea, considering there are several valid reasons for multiple posts to be identical.

Example:
Noob: Is this correct? echo 'foobar'; ??
Jcart: Yes
Noob: What about this? ..
Jcart: Yes
Obviously simplified example, but this seems more like an annoiyance for your users than it is helpful in preventing double posts. Instead what you can try it do not allow them to post within 15 (?) seconds of their previous post, eliminating the possiblity of them clicking the submit button twice.
Can you also store the ip of the poster in the same database and then do a check to see if the ip of th eprevious poster matches the ip of the current poster?

Posted: Sat Mar 04, 2006 8:52 pm
by s.dot
he's actually on topic (answering the question) like we should be :-P

Posted: Sat Mar 04, 2006 8:54 pm
by a94060
yep,sorry for tryin to hijack the thread or whatever pedroz. My apoligies :(

Posted: Sat Mar 04, 2006 8:55 pm
by John Cartwright
Can you also store the ip of the poster in the same database and then do a check to see if the ip of th eprevious poster matches the ip of the current poster?
IP's can be disguised easily (proxy), therefor a users IP can never be trusted. There are also legitamate reasons for a users IP to change several times throughout the visit though (AOL users, etc)..

Posted: Sat Mar 04, 2006 8:56 pm
by josh
a94060 wrote:doesnt d11wtq run the site?
No single person runs the site, every member here runs the site. The mods vote on important issues though.


What I would recommend is storing a md5 hash of the message in the database, and then checking if the md5 hash of the current message matches any other message. This will knock out exact dupes but you still have that 10% difference and the issue that jcart brought up. Could you tell us how this is going to be used because there are different algorithms for example soundex that could be used here..