Postgres BETWEEN two timestamps
Moderator: General Moderators
Postgres BETWEEN two timestamps
Im having problems currently with a postgres database where i am trying to select between datetime1 and datetime2.
The query itself is a tad more complicated than that (will provide full query in a sec), however its important to understand the context of the situation.
I have a database that currently is storing data in the following format every 15 minutes of every day, 24 hours a day:
node_id, poll_stamp, value
where:
node_id - the id of a sensor
poll_stamp - the current timestamp at the time of the sensor polling. format: Y-m-d H:i:s+00
value - you guessed it, a value.
This has led a very large number of records, aprox 850k so far, which also presents the problem of having an efficient query.
There are two requirements of the query which i am trying to meet.
1. Get a cumulative value of all values vetween #2
2. Select all values between 9am day 1 and 8:59am day 2, (this is a requirement, days MUST be measured between 9am day 1 and 9am day 2)
This query will be effective for a single day's worth of data.
This is the query i currently have:
[text]SELECT t.node_id, t.poll_stamp, (SELECT sum(x.value) FROM sensor_data x WHERE x.poll_stamp <= t.poll_stamp AND extract(DOY from poll_stamp) = '2010-12-01 09:00:00') AS value FROM sensor_data t WHERE t.poll_stamp >= '2010-12-01 09:00:00' AND t.poll_stamp <= '2010-12-02 08:59:00' ORDER BY t.poll_stamp[/text]
I have also tried
[text]SELECT t.node_id, t.poll_stamp, (SELECT sum(x.value) FROM sensor_data x WHERE x.poll_stamp <= t.poll_stamp AND extract(DOY from poll_stamp) = '2010-12-01 09:00:00') AS value FROM sensor_data t WHERE t.poll_stamp BETWEEN '2010-12-01 09:00:00' AND '2010-12-02 08:59:00' ORDER BY t.poll_stamp[/text]
The error im receiving with this (using PHP to execute) is:
[text]Warning: pg_query() [function.pg-query]: Query failed: ERROR: invalid input syntax for type double precision: "2010-12-01 09:00:00" in /web/sites/default/dashboard/class.database.php on line 55[/text]
There is also another issue with the above query, it seems to be somewhat inefficient when selecting a large amount of data, due to the cumulative values. This problem will be multiplied by a huge amount, because there is a need to show a cumulative value as described above, for every day of an entire year in a single graph.
So in other words, there is a need to have multiple graphs which can show data from 1 day, 1 week, 1 month, 1 year.
The yearly would need to be averaged out obviously, however with the graph, the cumulative value MUST be reset after each grouping.
So for a 1 week view, on the graph, each day at 9am, the cumulative value would be reset to 0.
Example: if the first cumulative value at the END of day 1 (8:59am) were to be 85.9, then at 9:00am day 2, would be reset to 0, so that it then counts from 9am-9am the next day.
This would mean that with the current query, i would need to run it 7 times in order to generate a weeks worth of data, and if that original query is already slow, then you can see where the efficiency problem is.
I hope i have explained this enough, if you need more info about anything, just let me know.
cheers
Weiry
The query itself is a tad more complicated than that (will provide full query in a sec), however its important to understand the context of the situation.
I have a database that currently is storing data in the following format every 15 minutes of every day, 24 hours a day:
node_id, poll_stamp, value
where:
node_id - the id of a sensor
poll_stamp - the current timestamp at the time of the sensor polling. format: Y-m-d H:i:s+00
value - you guessed it, a value.
This has led a very large number of records, aprox 850k so far, which also presents the problem of having an efficient query.
There are two requirements of the query which i am trying to meet.
1. Get a cumulative value of all values vetween #2
2. Select all values between 9am day 1 and 8:59am day 2, (this is a requirement, days MUST be measured between 9am day 1 and 9am day 2)
This query will be effective for a single day's worth of data.
This is the query i currently have:
[text]SELECT t.node_id, t.poll_stamp, (SELECT sum(x.value) FROM sensor_data x WHERE x.poll_stamp <= t.poll_stamp AND extract(DOY from poll_stamp) = '2010-12-01 09:00:00') AS value FROM sensor_data t WHERE t.poll_stamp >= '2010-12-01 09:00:00' AND t.poll_stamp <= '2010-12-02 08:59:00' ORDER BY t.poll_stamp[/text]
I have also tried
[text]SELECT t.node_id, t.poll_stamp, (SELECT sum(x.value) FROM sensor_data x WHERE x.poll_stamp <= t.poll_stamp AND extract(DOY from poll_stamp) = '2010-12-01 09:00:00') AS value FROM sensor_data t WHERE t.poll_stamp BETWEEN '2010-12-01 09:00:00' AND '2010-12-02 08:59:00' ORDER BY t.poll_stamp[/text]
The error im receiving with this (using PHP to execute) is:
[text]Warning: pg_query() [function.pg-query]: Query failed: ERROR: invalid input syntax for type double precision: "2010-12-01 09:00:00" in /web/sites/default/dashboard/class.database.php on line 55[/text]
There is also another issue with the above query, it seems to be somewhat inefficient when selecting a large amount of data, due to the cumulative values. This problem will be multiplied by a huge amount, because there is a need to show a cumulative value as described above, for every day of an entire year in a single graph.
So in other words, there is a need to have multiple graphs which can show data from 1 day, 1 week, 1 month, 1 year.
The yearly would need to be averaged out obviously, however with the graph, the cumulative value MUST be reset after each grouping.
So for a 1 week view, on the graph, each day at 9am, the cumulative value would be reset to 0.
Example: if the first cumulative value at the END of day 1 (8:59am) were to be 85.9, then at 9:00am day 2, would be reset to 0, so that it then counts from 9am-9am the next day.
This would mean that with the current query, i would need to run it 7 times in order to generate a weeks worth of data, and if that original query is already slow, then you can see where the efficiency problem is.
I hope i have explained this enough, if you need more info about anything, just let me know.
cheers
Weiry
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Postgres BETWEEN two timestamps
The problem is that your second date should be the next day (the 13th in your examples):
And it probably should be:
Code: Select all
WHERE t.poll_stamp BETWEEN '2010-12-01 09:00:00' AND '2010-13-02 08:59:00'
// OR
WHERE t.poll_stamp >= '2010-12-01 09:00:00' AND t.poll_stamp <= '2010-13-02 08:59:00'Code: Select all
WHERE t.poll_stamp >= '2010-12-01 09:00:00' AND t.poll_stamp < '2010-13-02 09:00:00'(#10850)
Re: Postgres BETWEEN two timestamps
Actually the date format you presented there is not a valid date format... Y-m-d H:i:s, not Y-d-m H:i:s, and i am using the Y-m-d format, i am selecting the 1st and the 2nd of the month.Christopher wrote:The problem is that your second date should be the next day (the 13th in your examples):
[text]WHERE t.poll_stamp BETWEEN '2010-12-01 09:00:00' AND '2010-13-02 08:59:00'[/text]
Your example wants me to select 1 month and 1 day? Also refer to my previous point.Christopher wrote:And it probably should be:Code: Select all
WHERE t.poll_stamp >= '2010-12-01 09:00:00' AND t.poll_stamp < '2010-13-02 09:00:00'
EDIT:: I forgot to post this originally, this is the error i receive.
Warning: pg_query() [function.pg-query]: Query failed: ERROR: invalid input syntax for type double precision: "2010-12-01 09:00:00"
Last edited by Weiry on Wed Dec 01, 2010 6:27 pm, edited 1 time in total.
Re: Postgres BETWEEN two timestamps
Seems like a double post ...
viewtopic.php?f=1&t=124927
You need type casting:
Please, use [ syntax=sql ] BB tags.
viewtopic.php?f=1&t=124927
You need type casting:
Code: Select all
WHERE t.poll_stamp BETWEEN timestamp '2010-12-01 09:00:00' AND timestamp '2010-13-02 08:59:00'There are 10 types of people in this world, those who understand binary and those who don't
Re: Postgres BETWEEN two timestamps
You may calculate cumulative values by using JOINs or subselects instead of user variables (though I'm not sure what will be the performance issues).
Probably the best will be a windowing function (I love PGSQL
)
http://www.postgresql.org/docs/current/ ... indow.html
Probably the best will be a windowing function (I love PGSQL
Code: Select all
select
test.cdate,
test.id,
sum(test.id) over (rows unbounded preceding)
from
test
group by
test.cdate,
test.id
order by
test.idThere are 10 types of people in this world, those who understand binary and those who don't
Re: Postgres BETWEEN two timestamps
Updated the query i have to the following:VladSun wrote: You need type casting:Code: Select all
WHERE t.poll_stamp BETWEEN timestamp '2010-12-01 09:00:00' AND timestamp '2010-13-02 08:59:00'
Code: Select all
SELECT t.node_id, t.poll_stamp,
(SELECT sum(x.value)
FROM sensor_data x
WHERE x.poll_stamp <= t.poll_stamp AND extract(DOY from poll_stamp) = '2010-12-02 09:00:00' AND x.node_id = '65') AS value
FROM sensor_data t
WHERE t.node_id = '65' AND t.poll_stamp BETWEEN timestamp '2010-12-02 09:00:00' AND timestamp '2010-12-03 08:59:00'
ORDER BY t.poll_stamp[text]Warning: pg_query() [function.pg-query]: Query failed: ERROR: invalid input syntax for type double precision: "2010-12-02 09:00:00" [/text]
I will have a look at that windowing function now, still pretty new to PGSQL
Re: Postgres BETWEEN two timestamps
Again ...Weiry wrote:And i still receive the same error...
every where you use strings instead of timestamps...VladSun wrote:You need type casting...
There are 10 types of people in this world, those who understand binary and those who don't
Re: Postgres BETWEEN two timestamps
Code: Select all
extract(DOY FROM poll_stamp) = '2010-12-02 09:00:00'There are 10 types of people in this world, those who understand binary and those who don't
Re: Postgres BETWEEN two timestamps
And i think thats a classic example of staring at code for too long.VladSun wrote:That's an interesting comparisonCode: Select all
extract(DOY FROM poll_stamp) = '2010-12-02 09:00:00'Rethink it ...
I couldn't exactly remove that, however i did replace it.
Full query now:
Code: Select all
SELECT t.fk_sensor_node_id, t.poll_stamp,
(SELECT sum(x.value)
FROM sensor_data x
WHERE x.poll_stamp <= t.poll_stamp
AND x.poll_stamp BETWEEN timestamp '2010-11-13 09:00:00+00' AND timestamp '2010-11-14 09:00:00+00'
AND x.node_id = '65') AS value
FROM sensor_data t
WHERE t.node_id = '65'
AND t.poll_stamp BETWEEN timestamp '2010-11-13 09:00:00+00' AND timestamp '2010-11-14 09:00:00+00'
ORDER BY t.poll_stamp DESC
Now i have to scale this out to be able to include weeks,months and year.
The only way i know of to do this is to use date_truc(), although i am slightly concerned with the large number of data and the processing that its going to be doing.
Since we have an application entering this data into the database, i think i will ask if there can be an additional table created which dumps an average value per day, this should reduce the amount of processing required when a whole year's worth of data is queried rather than try to get a query to do this all in one.
Re: Postgres BETWEEN two timestamps
Did you try
instead fo subselecting?
Code: Select all
sum(tableFieldHere) over (rows unbounded preceding) as aliasNameThere are 10 types of people in this world, those who understand binary and those who don't
Re: Postgres BETWEEN two timestamps
I did, it immediately gave me the error:
[text]Warning: pg_query() [function.pg-query]: Query failed: ERROR: syntax error at or near "over" LINE 1: ...T t.node_id, t.poll_stamp, sum(t.value) over (rows... ^[/text]
[text]Warning: pg_query() [function.pg-query]: Query failed: ERROR: syntax error at or near "over" LINE 1: ...T t.node_id, t.poll_stamp, sum(t.value) over (rows... ^[/text]
Re: Postgres BETWEEN two timestamps
PostgreSQL 8.4.5 here. Works fine.
There are 10 types of people in this world, those who understand binary and those who don't
Re: Postgres BETWEEN two timestamps
Running PostgreSQL 8.3.11 and causes the error.
What version did 'over' appear in pgsql?
What version did 'over' appear in pgsql?
Re: Postgres BETWEEN two timestamps
I don't know but I'm using a "stable" version ...
There are 10 types of people in this world, those who understand binary and those who don't