Replacing nested queries for optimisation?

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
anivad
Forum Commoner
Posts: 80
Joined: Thu Apr 09, 2009 11:16 pm

Replacing nested queries for optimisation?

Post by anivad »

Hi,

My webhost said that my site's MySQL coding had a lot of nested queries, which was placing a lot of stress on the server.

One instance of this is where the code grabs the 8 most recent posts made on the forums, retrieving data from two tables: f_posts (which has all the posts listed with associated threadid) and f_threads (which has all the threads listed with associated thread subjects, posting date, thread starter and so on):

Code: Select all

<?
$result = mysql_query("SELECT * FROM f_posts ORDER BY postid DESC LIMIT 0, 8"); 

while($info = mysql_fetch_object($result)) {

print "<p>in <a href='forums/view.php?threadid=" . $info->threadid . "'>";
$tresult = mysql_query("SELECT * FROM f_threads WHERE threadid='$info->threadid'");
$tinfo = mysql_fetch_object($tresult);
print stripslashes($tinfo->subject). "</a>";

print "<br>by " . stripslashes($info->author);
}
?>
I'm guessing the nested portion is at $tresult, where I call on the f_threads database to get the thread subject. I have similar scripts running to get the most recent comments posted on the site and so on.

My webhost suggested using foreach, but I'm not entirely sure how that would work. I'm still pretty new to this.

Any help is appreciated, thanks!
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Replacing nested queries for optimisation?

Post by requinix »

Use JOINs to refer to multiple tables in a query.

Despite your SELECT *s it looks like the only thing you actually care about is the thread ID, subject, and author. So start with getting just that.

Code: Select all

SELECT threadid, subject, author FROM f_threads
JOIN in the f_posts table so you can access the posted date. Might need to use table aliases.

Code: Select all

SELECT t.threadid, t.subject, t.author FROM f_threads t JOIN f_posts p ON t.threadid = p.threadid
Now you can sort by that date. Use a date, not an ID number.

Code: Select all

SELECT t.threadid, t.subject, t.author FROM f_threads t JOIN f_posts p ON t.threadid = p.threadid ORDER BY p.postdate DESC LIMIT 8

Code: Select all

<?php // full open tags

$result = mysql_query("SELECT t.threadid, t.subject, t.author FROM f_threads t JOIN f_posts p ON t.threadid = p.threadid ORDER BY p.postdate DESC LIMIT 8");
while($info = mysql_fetch_object($result)) {
    print "<p>in <a href='forums/view.php?threadid=" . $info->threadid . "'>";
    print stripslashes($info->subject) . "</a>";
    print "<br>by " . stripslashes($info->author);
}

?>
Ah, but the problems don't end there:
  • Switch to PDO or mysqli. The mysql extension is old, slow, inefficient, less powerful, less secure, and officially deprecated.
  • You shouldn't have to stripslashes() on anything. If you do then it means what's in your database has problems. You need to fix those problems. And then fix whatever code is introducing those problems.
  • Use htmlentities() or htmlspecialchars() when outputting user-supplied data. Unless you specifically prevented it when you saved the data, the thread subject or author's name could contain malicious XSS.
By the way, disregard what they said about using foreach loops. They don't know what they're talking about.
anivad
Forum Commoner
Posts: 80
Joined: Thu Apr 09, 2009 11:16 pm

Re: Replacing nested queries for optimisation?

Post by anivad »

Thanks for the help!

They actually said: "Avoid joins and nested selects, for PHP it's faster if you split the two selects and use PHP to parse via foreach. Multiple small queries are more efficient that one big query because the dataset that gets parsed is smaller."

Is that true, or should the solution you offered work fine?

I'll look into PDO or mysqli when I have the time. Would it work with MySQL 5.1?

EDIT: I ran your revision in MySQL and it actually took a lot longer - 0.2156 seconds, in total. :/
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Replacing nested queries for optimisation?

Post by requinix »

anivad wrote:They actually said: "Avoid joins and nested selects, for PHP it's faster if you split the two selects and use PHP to parse via foreach. Multiple small queries are more efficient that one big query because the dataset that gets parsed is smaller."

Is that true, or should the solution you offered work fine?
Very, very much false. It's true that nested SELECTs aren't very good, but doing the work in the database is still better than doing it in PHP.

Tack on an EXPLAIN to the beginning of that query

Code: Select all

EXPLAIN SELECT t.threadid, t.subject, t.author FROM f_threads t JOIN f_posts p ON t.threadid = p.threadid ORDER BY p.postdate DESC LIMIT 8
and run it yourself manually (like in phpMyAdmin or something). What does it output? I suspect you don't have indexes on columns that need them: a properly set up database should execute that query in a tiny fraction of a second.
anivad
Forum Commoner
Posts: 80
Joined: Thu Apr 09, 2009 11:16 pm

Re: Replacing nested queries for optimisation?

Post by anivad »

Results from EXPLAIN:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p ALL threadid NULL NULL NULL 12050 Using filesort
1 SIMPLE t eq_ref PRIMARY PRIMARY 4 users.p.threadid 1
anivad
Forum Commoner
Posts: 80
Joined: Thu Apr 09, 2009 11:16 pm

Re: Replacing nested queries for optimisation?

Post by anivad »

Hi,
I indexed threadid and postdate, and it's down to 0.0006 seconds. Thanks for your help!
anivad
Forum Commoner
Posts: 80
Joined: Thu Apr 09, 2009 11:16 pm

Re: Replacing nested queries for optimisation?

Post by anivad »

Wait, sorry.

I'm trying to get the author from the f_posts table, not the f_threads table, because the former only provides the name of the OP. How do I get the author for the individual post?
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Replacing nested queries for optimisation?

Post by requinix »

Change the t.author to p.author. "t" is an alias for the f_threads table and "p" is for the f_posts table. Saves a bit of writing, that's all. (You're free to use the full table name if you wish, but you still will have to include the name, like "f_posts.author", in cases where it's not obvious which table you want a column from.)
anivad
Forum Commoner
Posts: 80
Joined: Thu Apr 09, 2009 11:16 pm

Re: Replacing nested queries for optimisation?

Post by anivad »

Ah, ok. Thanks!
anivad
Forum Commoner
Posts: 80
Joined: Thu Apr 09, 2009 11:16 pm

Re: Replacing nested queries for optimisation?

Post by anivad »

I have a problem replicating this with the recent comments script, because the fields I want to join aren't exactly identical:

Code: Select all

SELECT c.commentid, c.subject, a.author, a.postdate FROM comments c JOIN articles a ON c.page = a.link ORDER BY c.commentid DESC LIMIT 6"
In this case, c.page is formatted 'articles/url'
and a.link is just 'url'.
Is there a way for me to strip the 'articles/' bit from p.page before or during the query? Previously I used a replace function before the query.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Replacing nested queries for optimisation?

Post by requinix »

What about using a.link instead? Doesn't that have the value you want?
anivad
Forum Commoner
Posts: 80
Joined: Thu Apr 09, 2009 11:16 pm

Re: Replacing nested queries for optimisation?

Post by anivad »

Yeah, but I thought to join tables they have to share a column with common values?

Otherwise I'll just do a mass replace on the table and strip the /articles/, because I could always put that in when reading data instead.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Replacing nested queries for optimisation?

Post by requinix »

Oh, I see. I thought you meant you wanted to SELECT that, not that you needed it in the join.

If you can reasonably remove the path information IMO that would be best. Like you said it's straightforward to add it back in with code.

But here's a question: what's the primary key on the page table? Is it that "page" or an ID? If it's the ID then that's what you would use in the comments table to relate the two, so then it doesn't matter what the "page" is. If not... consider adding one: it makes life easier.
Post Reply