I need to compute the sum of durations of multiple events based on consecutive values in another column.
ts | w ----------------------------+--- 2020-07-27 15:40:04.045+00 | t 2020-07-27 15:41:04.045+00 | t 2020-07-27 15:41:14.045+00 | f 2020-07-27 15:42:14.045+00 | t 2020-07-27 15:43:14.045+00 | t
The event is considered being active as long as the column w
has a consecutive value of true
.
The duration of the first event would be 60 seconds. '2020-07-27 15:41:04.045+00' - 2020-07-27 15:40:04.045+00
. The second event has the same duration. The sum of both would be 120 seconds.
What would be the best/most performant approach to computing these values? The longest time range we’ll be looking at will probably be half a year involving about 30 million rows.
I wrote a custom aggregate function that computes the duration but it takes about 16 seconds for only 1.5 million rows.
Aggregate (cost=444090.03..444090.04 rows=1 width=4) (actual time=16290.826..16290.828 rows=1 loops=1) -> Seq Scan on discriminator0 (cost=0.00..57289.03 rows=1547203 width=9) (actual time=0.016..1723.178 rows=1547229 loops=1) Planning Time: 0.196 ms JIT: Functions: 3 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.889 ms, Inlining 0.000 ms, Optimization 0.508 ms, Emission 6.472 ms, Total 7.870 ms Execution Time: 16291.836 ms
I’m new to SQL and basically just got this working through trial and error, so I’m sure there is room for improvement. Maybe even a whole different approach. Here’s a fiddle with the aggregate function.
I’m not sure if i should include the code because it’s quite long