Page 1 of 2
ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 6:01 pm
by jackpf
Hello!
Quick question - you know mysql's ON DUPLICATE KEY right?
Well, is there a similar sort of thing, but not for primary keys? Basically, like this:
Code: Select all
//check if the data exists
$check_query = mysql_query("SELECT ... the data, with multiple columns, none of which are primary keys");
//the data does not exist - create it
if(mysql_num_rows($check_query) == 0)
{
mysql_query("INSERT...into the table");
}
//the data exists - update it
else
{
mysql_query("UPDATE....the table");
}
Is there a way to put this into one query?
Sort of like:
Code: Select all
IF((SELECT NULL FROM `table` WHERE ...), UPDATE `table`...., INSERT INTO `table`...);
But that doesn't work.
Any ideas?
Cheers,
Jack.
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 6:08 pm
by Eran
you can use ON DUPLICATE KEY with a unique index as well, not just PK.
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 6:11 pm
by jackpf
How would I do that exactly? Sorry
EDIT
DW!!! Figured it out - the "create index" thing in phpmyadmin. Thanks once again pytrin!!
Cheers,
Jack.
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 6:15 pm
by Eran
There are 3 types of common indexes in MySQL (forgetting for a minute FULLTEXT and HASH) - primary, unique and (plain) index. You can only have one primary key in a table, but as many unique indexes as you'd like. create a unique index that covers the conditions for checking if a row exists, and use that in your insert query.
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 6:20 pm
by jackpf
Aha yeah, I updated my last post... I was a bit hasty reading the docs - I got it now!
With phpmyadmin there's a handy "create index" function...
Cheers pytrin...I owe you one. Well...more than one actually.
Thanks!!
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 6:33 pm
by Eran
cheers

Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 6:59 pm
by jackpf
Sorry...I have another problem. To save starting a new thread, might as well just post it here...
Basically, I'm working on a thread tracking system for my forum, to track who's read what thread etc...
Anyway, in the query to fetch all boards for the forum index, I left join all posts to count how many posts there are for that forum. I also left join the forum tracking table, to see if the forum has been read, using SUM() to add up all the thread post counts (since the sum of the "post counts" of threads for that forum will be how many posts are read for that forum).
But the problem is, the left join that counts posts seems to be affecting the join for the tracking data. It seems that the join on the thread tracking table is running for every post returned in the first join, to count the posts the forum has. This means that it returns something huge, since it's running SUM() on the same data for every post.
Here's an example:
Code: Select all
SELECT
F.*,
COUNT(P.`ID`) AS `PostCount`, #the number of posts for the forum
SUM(D.`PostCount`) AS `Tracking_PostCount` #the number of posts read in this forum - runs for every post returned in the first join!!!!
FROM `Forums` F
LEFT OUTER JOIN `Posts` P ON P.`Forumid`=F.`ID` #get the posts for this forum
LEFT OUTER JOIN `Tracking_Table` T ON T.`Forumid`=F.`ID` #check if this forum's threads have been read
Do you know if there's a way to make the two joins independent of each other, so that the join will only run for the main query?
Sorry, this is kind of hard to explain...if you need any more info just ask.
Thanks,
Jack.
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 7:03 pm
by Eran
I understand what you're saying basically, but without seeing the actual queries is hard to tell. Left joining when there are multiple rows matching the join condition will increase the number of rows returned. From what I remember from a previous post, you are using a flat table instead of normalizing it into "categories" "threads" "posts" etc which makes your queries somewhat convulted and harder to maintain.
Consider using DISTINCT inside the SUM and COUNT functions, but that will only mask the real problem - that you are fetching more rows that you need.
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 7:06 pm
by jackpf
Sorry, I edited my previous post to include an example.
Yeah...I thought about using DISTINCT, but then I'm fetching way more rows than I need. A forum with a lot of posts would take ages, fetching data it doesn't even need.
I could do this in a separate query...but if I can do this all in the same query, all the better.
Cheers,
Jack.
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 7:13 pm
by Eran
Code: Select all
SUM(D.`PostCount`) AS `Tracking_PostCount`
D should be T according to the rest of query. I assume this is a typo?
Anyway, both joins compose the result table returned. They cannot be separated - if they interfere with each other they need to run as separate queries.
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 7:23 pm
by jackpf
Yeah, my bad - I copied that bit out of the original query...I couldn't be bother to write it again
But hey!! I figured it out:
Instead of using a join, I can do this instead:
Code: Select all
SELECT
F.*,
COUNT(P.`ID`) AS `PostCount`, #the number of posts for the forum
(SELECT SUM(T.`PostCount`) FROM `Tracking_table` T WHERE F.`ID`=T.`Forumid`) AS `Tracking_PostCount` #using a subquery instead!!
FROM `Forums` F
LEFT OUTER JOIN `Posts` P ON P.`Forumid`=F.`ID` #get the posts for this forum
And that works perfectly! Just out of curiosity, is there much difference in performance between a join and a subquery?
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 7:39 pm
by Eran
You can't make generalizations, it is very dependent on the particular queries, schema, indexing etc. Run EXPLAIN on that query to learn more
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Sun Aug 16, 2009 8:39 pm
by jackpf
Hmm, this is what I get:
Code: Select all
(
[id] => 1
[select_type] => PRIMARY
[table] => F
[type] => ALL
[possible_keys] =>
[key] =>
[key_len] =>
[ref] =>
[rows] => 2159
[Extra] => Using where; Using temporary; Using filesort
)
Tbh, it doesn't really make much sense to me...
But! I timed the execution of the script with the subquery and without. And tbh, there was absolutely no difference. I guess since it's only returning one row, it's pretty quick.
Anyway,
Thanks for all your help pytrin!! I really appreciate it.
All the best,
Jack.
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Mon Aug 17, 2009 3:12 am
by Eran
Is that for the subquery?
The full explain results should be at-least 3 rows (1 per table). run it in phpmyadmin as it formats it nicely in an html table.
From that EXPLAIN the query is running a full table scan. consider adding an index on Forumid and postcount
Re: ON DUPLICATE KEY, but without a PK...?
Posted: Mon Aug 17, 2009 7:11 am
by jackpf
Here are the results for the query (I tried to comment it to make it a little less confusing

)
Code: Select all
SELECT F.*, #select all the columns for the forum
COUNT(P.`ID`) AS `PostCount`, #the count of the rows returned from the join - how many posts the forum has
(SELECT SUM(D.`PostCount`) FROM `Forum_Data` D WHERE D.`Alias`='username...' AND F.`ID`=D.`Forum`) AS `Forum_Data_PostCount`, #get tracking data
SUM(P.`Stats`) AS `Stats`, #the views for the forum = the sum of all its threads
MAX(P.`ID`) AS `LastPost` #the ID of the last post
FROM `Forum` F
LEFT OUTER JOIN `Forum` P ON F.`ID`=P.`Forum` AND P.`Type` IN('thread', 'post') #the join that works out how many posts the forum has
WHERE F.`Type`='forum' AND F.`ID`=F.`Forum` #select only forums - where their id = their forumid. This is so that only parent forums are selected (not child boards)
GROUP BY F.`ID`
ORDER BY SUBSTRING_INDEX(F.`Status`, ':', -1);
Obviously this has all PHP removed, but that's
basically it...

- tmp.JPG (45.17 KiB) Viewed 3137 times
I know...I know, I should store the PostCount in a field rather than working it out with a join...but atm that's a lot easier. I would have to edit it when deleting posts, posting, moving threads, etc... Besides, I don't have many posts in my forum yet, so it's very quick still.
But yeah...you reckon I should index `Forum`? Maybe `Thread` as well?