Page 1 of 1

MySQL join on lowest range integer from related table

Posted: Thu Feb 04, 2010 1:21 pm
by batfastad
Hi everyone

I've got a table of exchange rates and another table of invoices.
4 columns in exrates: exrate_id, exrate_eur, exrate_usd, exrate_stamp
4 columns in invoices (for this example): invoice_id, amount, currency, invoice_stamp

exrate_stamp and invoice_stamp are both integers which store a unix timestamp (not mysql timestamp).
The exrate_stamp is only the year/month... so converting exrate_stamp to a formatted time would return 1st Monthname 2010 00:00
invoice_stamp is the year/month/day... so this could translate as 17th Monthname 2010 00:00

Basically what I want to do is list out my invoices but also grabbing the exchange rate that was applicable for that invoice's year/month from my exrates table.

So if I have an invoice with an invoice_stamp of 24th November 2006, I'd like to do a join on exrates which would return the exchange rate for November 2006 along with the invoice details.

I can't do a plain left join between invoice_stamp and exrate_stamp because the 2 will rarely match... the invoice would have to have an invoice_stamp of 1st November 2006 00:00 to match the value of exrate_stamp

So I thought about using MySQL's ceiling() function in a join but that won't work as all my timestamps are integers.

I also thought about converting the invoice_stamp using FROM_UNIXTIME() but formatting it to only return 2006-11, then converting that 2006-11 back to a unix stamp using UNIX_TIMESTAMP() and doing the join on that value but I'm concerned what MySQL will do with timezones... according to the manual this conversion won't be lossless.
I manage all timezone and date formatting stuff in PHP not in MySQL.
Also that seems like a bit of a fudged way of doing things - a whole bunch of date formatting by the MySQL server just to perform a join.

And the 3rd way I thought of doing it was by using FROM_UNIXTIME() on exrate_stamp and invoice_stamp but having it formatted to only return 2006-11. But that means doing a MySQL join between 2 string values and I thought that was generally a bad thing.

Anyone got any suggestions or comments of a better "mathmatical" way of doing this?
Or is everyone totally confused about my database schema explanations above?

Cheers, B

Re: MySQL join on lowest range integer from related table

Posted: Thu Feb 04, 2010 1:59 pm
by Eran
You can use FROM_UNIXTIME() and then DATE() to extract just the date from both and compare on that, but you are right regarding the timezones. A preferred method would be to store the timestamps in native MySQL timestamps (which are stored as UTC internally) which would make this an easy comparison using DATE() only.
An alternative option through denormalization, is to save the date for each row in a separate column and compare on that. That way you could also use an index for the join.

Re: MySQL join on lowest range integer from related table

Posted: Fri Feb 05, 2010 11:06 am
by batfastad
Yeah my PHP timestamps are all stored in UTC
After a bit more thought I decided to give option #3 a bash from above.

I've just tried doing this...

Code: Select all

SELECT `invoices`.`invoice_id`, `invoices`.`amount`, FROM_UNIXTIME(`invoices`.`invoice_stamp`, '%Y%m') AS `invoice_stamp`, FROM_UNIXTIME(`exrates`.`exrate_stamp`, '%Y%m') AS `exrate_stamp`, `exrates`.`exrate_usd`, `exrates`.`exrate_eur` FROM `invoices`LEFT JOIN `exrates` ON FROM_UNIXTIME(`invoices`.`invoice_stamp`, '%Y%m')=FROM_UNIXTIME(`exrates`.`exrate_stamp`, '%Y%m')ORDER BY `invoices`.`invoice_stamp`
But that's pretty slow, takes about 0.05 for ~120 rows so there's got to be a better way of doing it.
I'm not surprised it's slow though considering the calculation involved in the join. I guess there isn't much I can do to index the *_stamp columns as the join is done on a calculation anyway.

Just wondered if there was a MySQL function in existance which would help me out, sort of like ceiling() but match the lowest nearest related record?
Or another funky mathmatical way?
I don't really want to "de-normalize" and store the month in a separate column as it's a bit of a fudge

Cheers, B

Re: MySQL join on lowest range integer from related table

Posted: Fri Feb 05, 2010 2:32 pm
by Eran
The real fudge is that you are not using native MySQL date/time types for storing dates. If you would have, at least a part of the comparison could be indexed (the date of the currency rate).

Re: MySQL join on lowest range integer from related table

Posted: Sat Feb 06, 2010 2:02 pm
by batfastad
Ok I thought of a more mathmatical way of doing this

This is what I tried above:

Code: Select all

LEFT JOIN `exrates` ON FROM_UNIXTIME(`invoices`.`invoice_stamp`, '%Y%m')=FROM_UNIXTIME(`exrates`.`exrate_stamp`, '%Y%m')
Returning ~3000 invoice records with matching exchange rates in 0.2864s

Then I thought I could store the end dates of the exchange rates as well as the start date, by adding 1 month to the exchange rate month, then subtracting 1 second.
So I have a proper timestamp range for each exchange rate, meaning my JOIN can be like this:

Code: Select all

LEFT JOIN `exrates` ON `invoices`.`invoice_stamp`>=`exrates`.`exrate_stamp` AND `invoices`.`invoice_stamp`<=`exrates`.`exrate_stamp_end`
Returns ~3000 invoice records with matching ex rates in 0.0755s

That was a big improvement (run with SQL_NO_CACHE and I took averages over 10 passes)
In reality this query will never be returning more than 100 invoices so I reckon that's probably good enough.

The other way I thought of doing this was to do a sub-select to return 1 exchange rate less than invoice_stamp sorted in descending exrate_stamp order. But I don't know much about sub-selects other than their use seems to be discouraged and I figured it was unlikely to be quicker than the above anyway.

The only way this could be faster is to de-normalize as you suggest and store the month-year with each invoice. But we store fewer exchange rates than invoices, ~100 exrates vs 3000 (and rising) invoices, so it makes sense to store the date range in that table.

Or possibly switch to mysql's timestamps instead but I prefer formatting dates and timezones in PHP rather than MySQL.

Cheers, B

Re: MySQL join on lowest range integer from related table

Posted: Sat Feb 06, 2010 2:08 pm
by Eran
Or possibly switch to mysql's timestamps instead but I prefer formatting dates and timezones in PHP rather than MySQL.
Those are not conflicting requirements. You can return a MySQL timestamp as a unix timestamp (using UNIX_TIMESTAMP() ) or convert it in PHP