Page 1 of 1
Pagination
Posted: Wed May 21, 2003 8:23 pm
by McGruff
From eclipse database abstraction layer notes, PagedQuery class for pagination (
http://www.students.cs.uu.nl/people/voo ... index.html):
This class might seem inefficient, exactly because it doesn't use mentioned database-specific SQL commands. However, for one very good reason, that isn't really true: it is almost always necessary to know the total number of pages in a query result - for example to be able to show a page navigator - and as this value is computed from the total number of rows in the query, the only way to get this number is by executing the full query. This completely eliminates the possible gain in efficiency when using LIMIT- or SELECT TOP-clauses, because in that case an additional (counting) query must be executed. Also, consider that most queries specify an ORDER BY clause, and remember that a DBMS can only compute this ordering by examining all rows in the result, even when just the first 10 are selected.
The single query Eclipse pagination model is claimed to be more efficient compared to a two query method, but I'm not so sure.
Say, you have a mature discussion forum with hundreds or even thousands of topics and many cols to get in the query, that's a huge result resource to create.
With the two-query LIMIT method, the first (total rows) query creates a much smaller result resource (a single column, probably an integer ID) which can be destroyed once you've got the total number of rows. The second (LIMIT) query, with just the rows for a single page, also creates a much smaller resource.
I don't know if there is any big advantage either way in terms of speed, but in terms of memory use, the two-query method would seem to have the upper hand. The advantage would depend on how many columns you need to fetch, and how many rows are in the table.
Eclipse notes state that avoiding use of LIMIT allows the class to support more databases so I guess the two query method isn't as portable.
Posted: Fri May 23, 2003 8:04 am
by BDKR
I agree with you McGruff for the most part. However, if you are concerned with portability, then Vincents approach is a viable one.
Before Vincent did this, I created a class to deal with this issue. I don't know 100% how it's being dealt with in Vincents example, but it seems to me that the overhead of either approach should be limited to the first page. I just ship the total row count and the query to the next page then adjust the limit clause on the end based on my position or page in the result set.
Now the two query approach should be portable as well by extending the class. Just create sub classes for each db that has a specific way of dealing with limit clauses or whatever. Perhaps...
Code: Select all
class oracleLimit extends PrevNext
{
function new_limit()
{ /* return a new limit clause for the query */ }
}
The above is for Oracle, as an example, but one could have subclasses for any db they're working with. The pagination class really shouldn't have any idea or need on that level beyond generating the limit clause.
Now Vincent is correct that it is 'always necessary to know the total number of pages in a query result', buy you only need to get the total count on results returned from the query on the first page. After that, you would be far better off to take the page number, total number or rows, and howmany results to show per page and calculate per page than do two queries per page.
To put it more clearly, you should only have to issue one query on each successive page after the initial page.
Cheers,
BDKR
Posted: Fri May 23, 2003 9:19 pm
by McGruff
Good idea - never thought of that

Posted: Sun May 25, 2003 9:24 am
by jason
And better yet, in some databases, like MySQL, you have ways to do this that are actually even faster.
SQL_CALC_FOUND_ROWS for MySQL is a good example of this, and I have found it to be incredibly useful. Just run your normal query, even with LIMIT, and you still can get a result back telling you how many total rows you would have found.
Note: This will be going into the Eclipse CE lib.
Posted: Sun May 25, 2003 7:02 pm
by McGruff
Thanks - a very useful tip
Took me a while to hunt this down in the manual (downloadable version isn't hot on searches) so, for anyone else looking for info, here it is:
FOUND_ROWS()
Returns the number of rows that the last SELECT SQL_CALC_FOUND_ROWS ... command would have returned, if wasn't restricted with LIMIT.
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
The second SELECT will return a number indicating how many rows the first SELECT would have returned had it been written without the LIMIT clause. Note that if you are using SELECT SQL_CALC_FOUND_ROWS ... MySQL has to calculate all rows in the result set. However, this is faster than if you would not use LIMIT, as the result set need not be sent to the client. SQL_CALC_FOUND_ROWS is available starting at MySQL version 4.0.0.
Posted: Tue May 27, 2003 11:54 pm
by fractalvibes
Good questions for my DBA! Guessing a lot depends upon the relative efficiency of the database engine and how it actually implements these requests. The two-step approach would seem to have advantages network-wise, if your DB is on it's own server.
Hmmmm
Phil J.
Posted: Wed May 28, 2003 10:30 am
by BDKR
jason wrote:And better yet, in some databases, like MySQL, you have ways to do this that are actually even faster.
SQL_CALC_FOUND_ROWS for MySQL is a good example of this, and I have found it to be incredibly useful. Just run your normal query, even with LIMIT, and you still can get a result back telling you how many total rows you would have found.
Note: This will be going into the Eclipse CE lib.
When dealing with pagination of a result set, this may not be useful depending on the solution you create.
If you move the query, how many total rows in the result set, and how many results to show to each page being viewed, then there is no more need to worry about acquiring this information again. It's all done on the first page. The only think you have to do is figure out based on page number, how many results, and how many to show what the limit clause will be. This is simply some calculation being done by php in memory as opposed to an additonal query and addtional overhead of network usage.
OK, if you don't want to move the query itself from page to page, and I can understand that, then use sessions or something and store the query in the session. But at that rate, store the rest of the information too. That information being how many total results and how many to show per page. Therefore, once again, there is no need to worry about a second query on successive pages.
Really, the point here is that the overhead of getting this datum from the db really only needs to be incurred once. If I know from the first page how many total results came back from the query, I don't need to ask again. Therefore SQL_CALC_FOUND_ROWS should only be needful on the first page and only after the full query has been made.
fractalvibes wrote:
Good questions for my DBA! Guessing a lot depends upon the relative efficiency of the database engine and how it actually implements these requests. The two-step approach would seem to have advantages network-wise, if your DB is on it's own server.
For your DBA? Are you going to take a test or something soon? Did you know MySQL as jumped on the certification bandwagon also? Just curious for the most part. Where I work now, I have to wear four or five different hats. DBA, Network Admin, Systems and Cluster Admin, developer (some web based administration stuff as well as other automated tools), and now I also have to deal with routing inbound wifi traffic. The single biggest things that has helped me deal with our databases is understanding (and running) replication and the binlog files. Not to mention using all the tools that come with MySQL. It's one thing to know how to write a query against a db, but something else to update a slave from a snapshot of the master.
Cheers,
BDKR
Posted: Wed May 28, 2003 8:34 pm
by fractalvibes
Some good idea here. You could persist the total count and also an integer value to current previous and next row. Couldn't you also persist the recordset you got back somehow - like the idea of an ADO disconnected recordset? Haven't actually tried this,but saving that `snapshot` of your recordset should work, though not sure how - certainly not as a session object.
DBA - well, we are fortunate. Largely an OS/390 mainframe shop, we have 4 DBAs. One is pretty much dedicated to our web dev team and concentrates on DB2/UDB for Win2000. Kind of fun to explore things the
DBAs don't even know about. Work is 100% DB2 (with a little Access from older apps) but I really like MySql. Glad they are going the certification route. Overall knowledge will serve you well with most any database, but the actual implementation may differ from database to database - esp. how best to optimize everything. Say - did I read that MySql would soon support sub-selects, views, and some other things that
the critics used to bash it for? I am not a DBA (nor do I play one on TV!),
but I enjoy working with ours and exploring!
Phil J.