Page 1 of 1

Postgres BETWEEN two timestamps

Posted: Tue Nov 30, 2010 6:07 pm
by Weiry
Im having problems currently with a postgres database where i am trying to select between datetime1 and datetime2.

The query itself is a tad more complicated than that (will provide full query in a sec), however its important to understand the context of the situation.

I have a database that currently is storing data in the following format every 15 minutes of every day, 24 hours a day:
node_id, poll_stamp, value

where:
node_id - the id of a sensor
poll_stamp - the current timestamp at the time of the sensor polling. format: Y-m-d H:i:s+00
value - you guessed it, a value.

This has led a very large number of records, aprox 850k so far, which also presents the problem of having an efficient query.

There are two requirements of the query which i am trying to meet.
1. Get a cumulative value of all values vetween #2
2. Select all values between 9am day 1 and 8:59am day 2, (this is a requirement, days MUST be measured between 9am day 1 and 9am day 2)

This query will be effective for a single day's worth of data.
This is the query i currently have:
[text]SELECT t.node_id, t.poll_stamp, (SELECT sum(x.value) FROM sensor_data x WHERE x.poll_stamp <= t.poll_stamp AND extract(DOY from poll_stamp) = '2010-12-01 09:00:00') AS value FROM sensor_data t WHERE t.poll_stamp >= '2010-12-01 09:00:00' AND t.poll_stamp <= '2010-12-02 08:59:00' ORDER BY t.poll_stamp[/text]
I have also tried
[text]SELECT t.node_id, t.poll_stamp, (SELECT sum(x.value) FROM sensor_data x WHERE x.poll_stamp <= t.poll_stamp AND extract(DOY from poll_stamp) = '2010-12-01 09:00:00') AS value FROM sensor_data t WHERE t.poll_stamp BETWEEN '2010-12-01 09:00:00' AND '2010-12-02 08:59:00' ORDER BY t.poll_stamp[/text]

The error im receiving with this (using PHP to execute) is:
[text]Warning: pg_query() [function.pg-query]: Query failed: ERROR: invalid input syntax for type double precision: "2010-12-01 09:00:00" in /web/sites/default/dashboard/class.database.php on line 55[/text]


There is also another issue with the above query, it seems to be somewhat inefficient when selecting a large amount of data, due to the cumulative values. This problem will be multiplied by a huge amount, because there is a need to show a cumulative value as described above, for every day of an entire year in a single graph.

So in other words, there is a need to have multiple graphs which can show data from 1 day, 1 week, 1 month, 1 year.
The yearly would need to be averaged out obviously, however with the graph, the cumulative value MUST be reset after each grouping.
So for a 1 week view, on the graph, each day at 9am, the cumulative value would be reset to 0.

Example: if the first cumulative value at the END of day 1 (8:59am) were to be 85.9, then at 9:00am day 2, would be reset to 0, so that it then counts from 9am-9am the next day.

This would mean that with the current query, i would need to run it 7 times in order to generate a weeks worth of data, and if that original query is already slow, then you can see where the efficiency problem is.

I hope i have explained this enough, if you need more info about anything, just let me know.

cheers
Weiry

Re: Postgres BETWEEN two timestamps

Posted: Tue Nov 30, 2010 11:21 pm
by Christopher
The problem is that your second date should be the next day (the 13th in your examples):

Code: Select all

WHERE t.poll_stamp BETWEEN '2010-12-01 09:00:00' AND '2010-13-02 08:59:00'
// OR
WHERE t.poll_stamp >= '2010-12-01 09:00:00' AND t.poll_stamp <= '2010-13-02 08:59:00'
And it probably should be:

Code: Select all

WHERE t.poll_stamp >= '2010-12-01 09:00:00' AND t.poll_stamp < '2010-13-02 09:00:00'

Re: Postgres BETWEEN two timestamps

Posted: Tue Nov 30, 2010 11:45 pm
by Weiry
Christopher wrote:The problem is that your second date should be the next day (the 13th in your examples):
[text]WHERE t.poll_stamp BETWEEN '2010-12-01 09:00:00' AND '2010-13-02 08:59:00'[/text]
Actually the date format you presented there is not a valid date format... Y-m-d H:i:s, not Y-d-m H:i:s, and i am using the Y-m-d format, i am selecting the 1st and the 2nd of the month.
Christopher wrote:And it probably should be:

Code: Select all

WHERE t.poll_stamp >= '2010-12-01 09:00:00' AND t.poll_stamp < '2010-13-02 09:00:00'
Your example wants me to select 1 month and 1 day? Also refer to my previous point.



EDIT:: I forgot to post this originally, this is the error i receive.
Warning: pg_query() [function.pg-query]: Query failed: ERROR: invalid input syntax for type double precision: "2010-12-01 09:00:00"

Re: Postgres BETWEEN two timestamps

Posted: Wed Dec 01, 2010 4:11 pm
by VladSun
Seems like a double post ...
viewtopic.php?f=1&t=124927

You need type casting:

Code: Select all

WHERE t.poll_stamp BETWEEN timestamp '2010-12-01 09:00:00' AND timestamp '2010-13-02 08:59:00'
Please, use [ syntax=sql ] BB tags.

Re: Postgres BETWEEN two timestamps

Posted: Wed Dec 01, 2010 4:24 pm
by VladSun
You may calculate cumulative values by using JOINs or subselects instead of user variables (though I'm not sure what will be the performance issues).

Probably the best will be a windowing function (I love PGSQL ;) )

Code: Select all

select 
	test.cdate,
	test.id,
	sum(test.id) over (rows unbounded preceding)
from 
	test 
group by 
	test.cdate,
	test.id
order by
	test.id
http://www.postgresql.org/docs/current/ ... indow.html

Re: Postgres BETWEEN two timestamps

Posted: Wed Dec 01, 2010 6:08 pm
by Weiry
VladSun wrote: You need type casting:

Code: Select all

WHERE t.poll_stamp BETWEEN timestamp '2010-12-01 09:00:00' AND timestamp '2010-13-02 08:59:00'
Updated the query i have to the following:

Code: Select all

SELECT t.node_id, t.poll_stamp, 
(SELECT sum(x.value) 
     FROM sensor_data x 
     WHERE x.poll_stamp <= t.poll_stamp AND extract(DOY from poll_stamp) = '2010-12-02 09:00:00' AND x.node_id = '65') AS value 
FROM sensor_data t 
WHERE t.node_id = '65' AND t.poll_stamp BETWEEN timestamp '2010-12-02 09:00:00' AND timestamp '2010-12-03 08:59:00'
ORDER BY t.poll_stamp
And i still receive the same error:
[text]Warning: pg_query() [function.pg-query]: Query failed: ERROR: invalid input syntax for type double precision: "2010-12-02 09:00:00" [/text]

I will have a look at that windowing function now, still pretty new to PGSQL

Re: Postgres BETWEEN two timestamps

Posted: Thu Dec 02, 2010 4:24 am
by VladSun
Weiry wrote:And i still receive the same error...
Again ...
VladSun wrote:You need type casting...
every where you use strings instead of timestamps...

Re: Postgres BETWEEN two timestamps

Posted: Thu Dec 02, 2010 4:26 am
by VladSun

Code: Select all

extract(DOY FROM poll_stamp) = '2010-12-02 09:00:00'
That's an interesting comparison :twisted: Rethink it ...

Re: Postgres BETWEEN two timestamps

Posted: Thu Dec 02, 2010 5:55 pm
by Weiry
VladSun wrote:

Code: Select all

extract(DOY FROM poll_stamp) = '2010-12-02 09:00:00'
That's an interesting comparison :twisted: Rethink it ...
And i think thats a classic example of staring at code for too long. 8O :roll:

I couldn't exactly remove that, however i did replace it.
Full query now:

Code: Select all

SELECT t.fk_sensor_node_id, t.poll_stamp, 
   (SELECT sum(x.value) 
    FROM sensor_data x 
    WHERE x.poll_stamp <= t.poll_stamp 
        AND x.poll_stamp BETWEEN timestamp '2010-11-13 09:00:00+00' AND timestamp '2010-11-14 09:00:00+00' 
        AND x.node_id = '65') AS value 
FROM sensor_data t 
WHERE t.node_id = '65' 
    AND t.poll_stamp BETWEEN timestamp '2010-11-13 09:00:00+00' AND timestamp '2010-11-14 09:00:00+00' 
ORDER BY t.poll_stamp DESC
This generates exactly what i need and works reasonably well.

Now i have to scale this out to be able to include weeks,months and year.
The only way i know of to do this is to use date_truc(), although i am slightly concerned with the large number of data and the processing that its going to be doing.

Since we have an application entering this data into the database, i think i will ask if there can be an additional table created which dumps an average value per day, this should reduce the amount of processing required when a whole year's worth of data is queried rather than try to get a query to do this all in one.

Re: Postgres BETWEEN two timestamps

Posted: Thu Dec 02, 2010 6:28 pm
by VladSun
Did you try

Code: Select all

sum(tableFieldHere) over (rows unbounded preceding) as aliasName
instead fo subselecting?

Re: Postgres BETWEEN two timestamps

Posted: Thu Dec 02, 2010 6:57 pm
by Weiry
I did, it immediately gave me the error:

[text]Warning: pg_query() [function.pg-query]: Query failed: ERROR: syntax error at or near "over" LINE 1: ...T t.node_id, t.poll_stamp, sum(t.value) over (rows... ^[/text]

Re: Postgres BETWEEN two timestamps

Posted: Fri Dec 03, 2010 5:22 am
by VladSun
PostgreSQL 8.4.5 here. Works fine.

Re: Postgres BETWEEN two timestamps

Posted: Sun Dec 05, 2010 4:05 pm
by Weiry
Running PostgreSQL 8.3.11 and causes the error.

What version did 'over' appear in pgsql?

Re: Postgres BETWEEN two timestamps

Posted: Sun Dec 05, 2010 4:18 pm
by VladSun
I don't know but I'm using a "stable" version ...