Page 1 of 1

Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 12:13 pm
by anivad
Hi,

My webhost said that my site's MySQL coding had a lot of nested queries, which was placing a lot of stress on the server.

One instance of this is where the code grabs the 8 most recent posts made on the forums, retrieving data from two tables: f_posts (which has all the posts listed with associated threadid) and f_threads (which has all the threads listed with associated thread subjects, posting date, thread starter and so on):

Code: Select all

<?
$result = mysql_query("SELECT * FROM f_posts ORDER BY postid DESC LIMIT 0, 8"); 

while($info = mysql_fetch_object($result)) {

print "<p>in <a href='forums/view.php?threadid=" . $info->threadid . "'>";
$tresult = mysql_query("SELECT * FROM f_threads WHERE threadid='$info->threadid'");
$tinfo = mysql_fetch_object($tresult);
print stripslashes($tinfo->subject). "</a>";

print "<br>by " . stripslashes($info->author);
}
?>
I'm guessing the nested portion is at $tresult, where I call on the f_threads database to get the thread subject. I have similar scripts running to get the most recent comments posted on the site and so on.

My webhost suggested using foreach, but I'm not entirely sure how that would work. I'm still pretty new to this.

Any help is appreciated, thanks!

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 12:59 pm
by requinix
Use JOINs to refer to multiple tables in a query.

Despite your SELECT *s it looks like the only thing you actually care about is the thread ID, subject, and author. So start with getting just that.

Code: Select all

SELECT threadid, subject, author FROM f_threads
JOIN in the f_posts table so you can access the posted date. Might need to use table aliases.

Code: Select all

SELECT t.threadid, t.subject, t.author FROM f_threads t JOIN f_posts p ON t.threadid = p.threadid
Now you can sort by that date. Use a date, not an ID number.

Code: Select all

SELECT t.threadid, t.subject, t.author FROM f_threads t JOIN f_posts p ON t.threadid = p.threadid ORDER BY p.postdate DESC LIMIT 8

Code: Select all

<?php // full open tags

$result = mysql_query("SELECT t.threadid, t.subject, t.author FROM f_threads t JOIN f_posts p ON t.threadid = p.threadid ORDER BY p.postdate DESC LIMIT 8");
while($info = mysql_fetch_object($result)) {
    print "<p>in <a href='forums/view.php?threadid=" . $info->threadid . "'>";
    print stripslashes($info->subject) . "</a>";
    print "<br>by " . stripslashes($info->author);
}

?>
Ah, but the problems don't end there:
  • Switch to PDO or mysqli. The mysql extension is old, slow, inefficient, less powerful, less secure, and officially deprecated.
  • You shouldn't have to stripslashes() on anything. If you do then it means what's in your database has problems. You need to fix those problems. And then fix whatever code is introducing those problems.
  • Use htmlentities() or htmlspecialchars() when outputting user-supplied data. Unless you specifically prevented it when you saved the data, the thread subject or author's name could contain malicious XSS.
By the way, disregard what they said about using foreach loops. They don't know what they're talking about.

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 1:04 pm
by anivad
Thanks for the help!

They actually said: "Avoid joins and nested selects, for PHP it's faster if you split the two selects and use PHP to parse via foreach. Multiple small queries are more efficient that one big query because the dataset that gets parsed is smaller."

Is that true, or should the solution you offered work fine?

I'll look into PDO or mysqli when I have the time. Would it work with MySQL 5.1?

EDIT: I ran your revision in MySQL and it actually took a lot longer - 0.2156 seconds, in total. :/

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 1:33 pm
by requinix
anivad wrote:They actually said: "Avoid joins and nested selects, for PHP it's faster if you split the two selects and use PHP to parse via foreach. Multiple small queries are more efficient that one big query because the dataset that gets parsed is smaller."

Is that true, or should the solution you offered work fine?
Very, very much false. It's true that nested SELECTs aren't very good, but doing the work in the database is still better than doing it in PHP.

Tack on an EXPLAIN to the beginning of that query

Code: Select all

EXPLAIN SELECT t.threadid, t.subject, t.author FROM f_threads t JOIN f_posts p ON t.threadid = p.threadid ORDER BY p.postdate DESC LIMIT 8
and run it yourself manually (like in phpMyAdmin or something). What does it output? I suspect you don't have indexes on columns that need them: a properly set up database should execute that query in a tiny fraction of a second.

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 1:40 pm
by anivad
Results from EXPLAIN:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p ALL threadid NULL NULL NULL 12050 Using filesort
1 SIMPLE t eq_ref PRIMARY PRIMARY 4 users.p.threadid 1

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 1:46 pm
by anivad
Hi,
I indexed threadid and postdate, and it's down to 0.0006 seconds. Thanks for your help!

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 1:53 pm
by anivad
Wait, sorry.

I'm trying to get the author from the f_posts table, not the f_threads table, because the former only provides the name of the OP. How do I get the author for the individual post?

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 1:58 pm
by requinix
Change the t.author to p.author. "t" is an alias for the f_threads table and "p" is for the f_posts table. Saves a bit of writing, that's all. (You're free to use the full table name if you wish, but you still will have to include the name, like "f_posts.author", in cases where it's not obvious which table you want a column from.)

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 2:01 pm
by anivad
Ah, ok. Thanks!

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 2:17 pm
by anivad
I have a problem replicating this with the recent comments script, because the fields I want to join aren't exactly identical:

Code: Select all

SELECT c.commentid, c.subject, a.author, a.postdate FROM comments c JOIN articles a ON c.page = a.link ORDER BY c.commentid DESC LIMIT 6"
In this case, c.page is formatted 'articles/url'
and a.link is just 'url'.
Is there a way for me to strip the 'articles/' bit from p.page before or during the query? Previously I used a replace function before the query.

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 3:12 pm
by requinix
What about using a.link instead? Doesn't that have the value you want?

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 3:19 pm
by anivad
Yeah, but I thought to join tables they have to share a column with common values?

Otherwise I'll just do a mass replace on the table and strip the /articles/, because I could always put that in when reading data instead.

Re: Replacing nested queries for optimisation?

Posted: Tue Apr 23, 2013 5:02 pm
by requinix
Oh, I see. I thought you meant you wanted to SELECT that, not that you needed it in the join.

If you can reasonably remove the path information IMO that would be best. Like you said it's straightforward to add it back in with code.

But here's a question: what's the primary key on the page table? Is it that "page" or an ID? If it's the ID then that's what you would use in the comments table to relate the two, so then it doesn't matter what the "page" is. If not... consider adding one: it makes life easier.