similar_text too slow!

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

pedroz
Forum Commoner
Posts: 99
Joined: Thu Nov 03, 2005 6:21 am

similar_text too slow!

Post by pedroz »

I am developing some code experiences and I would like to know if you could help me with the following code:

Code: Select all

$result = $db->sql_query("SELECT postBody FROM posts WHERE status='1'");

while($row = mysql_fetch_array($result)){
similar_text(preg_replace("/\r\n|\n|\r/", "", strtolower($postform)), preg_replace("/\r\n|\n|\r/", "", strtolower($row[0])), $similarity_pst);
if (number_format($similarity_pst, 0) > 90){$flag = "message already sent"; break;}  
}
Target idea:
check if a form post message is already in database or any similar post message(I picked a 90% margin)

The code above works but it slow down my machine (specialy if the post is huge: more 10000chars) and it will take a lot of time if several posts are already in database...

Do you have any idea to develop this in a fast mode ?
Last edited by pedroz on Sat Mar 04, 2006 7:20 pm, edited 1 time in total.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

what's your reason for doing this? (just curious)

if you're checking to see if a message has already been entered, you should check for an id, or another unique field name
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
pedroz
Forum Commoner
Posts: 99
Joined: Thu Nov 03, 2005 6:21 am

Post by pedroz »

Hi scrotaye

I see your point checking by id and will reduce the query but my objective is check all messages to avoid doble posts...

On other words, do not allow to insert similar text in database post field.
User avatar
a94060
Forum Regular
Posts: 543
Joined: Fri Feb 10, 2006 4:53 pm

Post by a94060 »

if you do not mind,could you use the php tags?(im sorry,im not even mod,but i jus been going thru posts and saying this :oops: )
pedroz
Forum Commoner
Posts: 99
Joined: Thu Nov 03, 2005 6:21 am

Post by pedroz »

I tried to post it with php tags but I received an error... Now changed and OK! :)
User avatar
a94060
Forum Regular
Posts: 543
Joined: Fri Feb 10, 2006 4:53 pm

Post by a94060 »

ok thanks.im sorry mods if im tryin to take your job 8O
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

a94060 wrote:ok thanks.im sorry mods if im tryin to take your job 8O
:lol: It's cool :)
User avatar
a94060
Forum Regular
Posts: 543
Joined: Fri Feb 10, 2006 4:53 pm

Post by a94060 »

:lol: thanks, i wish i was mod...
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

d11wtq isn't really a mod........ he just acts like it :lol:
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

I don't think your idea of preventing double posts is a good idea, considering there are several valid reasons for multiple posts to be identical.

Example:
Noob: Is this correct? echo 'foobar'; ??
Jcart: Yes
Noob: What about this? ..
Jcart: Yes
Obviously simplified example, but this seems more like an annoiyance for your users than it is helpful in preventing double posts. Instead what you can try it do not allow them to post within 15 (?) seconds of their previous post, eliminating the possiblity of them clicking the submit button twice.
User avatar
a94060
Forum Regular
Posts: 543
Joined: Fri Feb 10, 2006 4:53 pm

Post by a94060 »

scrotaye wrote:d11wtq isn't really a mod........ he just acts like it :lol:
doesnt d11wtq run the site?

Jcart wrote:I don't think your idea of preventing double posts is a good idea, considering there are several valid reasons for multiple posts to be identical.

Example:
Noob: Is this correct? echo 'foobar'; ??
Jcart: Yes
Noob: What about this? ..
Jcart: Yes
Obviously simplified example, but this seems more like an annoiyance for your users than it is helpful in preventing double posts. Instead what you can try it do not allow them to post within 15 (?) seconds of their previous post, eliminating the possiblity of them clicking the submit button twice.
Can you also store the ip of the poster in the same database and then do a check to see if the ip of th eprevious poster matches the ip of the current poster?
Last edited by a94060 on Sat Mar 04, 2006 8:53 pm, edited 3 times in total.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

he's actually on topic (answering the question) like we should be :-P
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
a94060
Forum Regular
Posts: 543
Joined: Fri Feb 10, 2006 4:53 pm

Post by a94060 »

yep,sorry for tryin to hijack the thread or whatever pedroz. My apoligies :(
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Can you also store the ip of the poster in the same database and then do a check to see if the ip of th eprevious poster matches the ip of the current poster?
IP's can be disguised easily (proxy), therefor a users IP can never be trusted. There are also legitamate reasons for a users IP to change several times throughout the visit though (AOL users, etc)..
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Post by josh »

a94060 wrote:doesnt d11wtq run the site?
No single person runs the site, every member here runs the site. The mods vote on important issues though.


What I would recommend is storing a md5 hash of the message in the database, and then checking if the md5 hash of the current message matches any other message. This will knock out exact dupes but you still have that 10% difference and the issue that jcart brought up. Could you tell us how this is going to be used because there are different algorithms for example soundex that could be used here..
Post Reply