How should I determine # of posts

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

foobar
Forum Regular
Posts: 613
Joined: Wed Sep 28, 2005 10:08 am

Post by foobar »

seodevhead wrote:Well with this being said, is it still efficient to use COUNT() for my purposes? Ideally I would love to use count() just because it is so easy and will always return the right answer. Thanks for the good input guys.
I'd say no. Just think about it. Unless you'll cache the results somewhere (which isn't a good idea in a forum system), you'll be running a query for every post count that ever appears on any page. Every time someone flips the page in a thread, bang, another set of queries for each user. That's an extra query for every user. Instead of just querying for flat data from the user table, you'll be running a "dynamic" query on the entire posts table for each user. Ouch. That'll quickly become too slow to be manageable. You'd clearly notice the difference once the forum gets to the medium-sized range. Maybe even in the lower-medium ranges depending on your hardware.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

These querys were done in iterations of 50,000

Use of SELECT COUNT(*) as `postCount` FROM `posts`

Test1
Total Rows: 5000
Total Average: 0.241804

Test2
Total Rows: 10000
Total Average: 0.2348

Test3
Total Rows: 15000
Total Average: 0.231798

Test4
Total Rows: 100000
Total Average: 0.249216s

From these benchmarks, we can see that even a user with 100,000 post is processed just as fast as a user with a mere 5000 posts.
hmmmmm.... 8O

Use of SELECT `postcount` FROM `users` LIMIT 1

Test1
Total Rows: 1
Total Average: 0.332324





According to these benchmarks, the use of count(*) is actually faster than selecting the row value.
Last edited by John Cartwright on Sat Dec 03, 2005 8:01 pm, edited 2 times in total.
User avatar
neophyte
DevNet Resident
Posts: 1537
Joined: Tue Jan 20, 2004 4:58 pm
Location: Minnesota

Post by neophyte »

That's incredible. 8O

Well I'm eating "my own words" sandwich right about now.

Thanks for the research JKart.

Although what if there were 1000+ users online and count() has to be run 10 times per page with an average page request of 500 per minute. Do yo think that would affect it or not?
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Post by AKA Panama Jack »

That test is flawed because it isn't testing a comparison value. IE WHERE user_id='$userid' in the query.

Most databases will have the total record count stored for a table and readily retrievable without having to scan the database table. The query you used will pull that value and not scan the entire table looking for matches. If there is a WHERE statement in the SQL statement then it is forced to scan every record in the table counting those that match the WHERE statement. The more complex the WHERE statement the more time it takes to get the count of records.

Say you have a table with 100,000 records and 10,000 of those records are for user_id number 2.

SELECT COUNT(*) as `postCount` FROM `posts`

will be faster than

SELECT COUNT(*) as `postCount` FROM `posts` where user_id='2'

and getting the post count from the users record will be alot faster than

SELECT COUNT(*) as `postCount` FROM `posts` where user_id='2'

And in many cases all of the information for a user is already retrieved at the start so there isn't an extra query call needed to get their post count and that saves a ton of time.

As a side note the use of the LIMIT flag is good for records at the beginning of a large table but not as much for records at the end of the table. Say you have a 1 million record table and want two records. One record is at position 1 in the table and the second record is at position 1 million in the table and you use LIMIT 1 on both. The record at position 1 million in the table will take as long to retrieve as if you didn't use LIMIT 1. Even with that it is best to use LIMIT flags where you can in your queries as it does reduce the overall load on the database server and speeds things up.
Post Reply