PostgreSQL – Postgres – Calculation of cumulative data

I collect data from some API sources through Python and add it to 2 tables in Postgres.

Then, I use this data To generate reports, join and group/filter data. I add thousands of rows every day.

Cost, revenue and sales are always cumulative, which means that each data point comes from t1 of the product, and t2 is the time when the data is backtracked.

So, the latest data pull will include all previous data until t1. t1, t2 are timestamps without time zone in Postgres. I am currently using Postgres 10.

Sample:

id, vendor_id, product_id, t1, t2, cost, revenue, sales
1, a, a, 2018- 01-01, 2018-04-18, 50, 200, 34
2, a, b, 2018-05-01, 2018-04-18, 10, 100, 10
3, a, c, 2018-01-02, 2018-04-18, 12, 100, 9
4, a, d, 2018-01-03, 2018-04-18, 12, 100, 8
5, b, e, 2018-25-02, 2018-04-18, 12, 100, 7

6, a, a, 2018-01-01, 2018-04-17, 40 , 200, 30
7, a, b, 2018-05-01, 2018-04-17, 0, 95, 8
8, a, c, 2018-01-02, 2018-04 -17, 10, 12, 5
9, a, d, 2018-01-03, 2018-04-17, 8, 90, 4
10, b, e, 2018-25-02 , 2018-04-17, 9, 0-, 3

Cost and revenue come from two tables, I add them to vendor_id, product_id and t2.

Is there a way I can Go through all the data and “shift” it and subtract, so I am not accumulating data, but based on time series data?

Should this be done before storing, or is it better to make the report?

For reference, currently if I want a report of two changes, I will do two subqueries, but it seems to go backwards instead of calculating the data in time series, just the interval required for aggregation.

p>

with report1 as (select ...),
report2 as (select ...)
select .. from report1 left outer join report2 on .. .

Thanks a lot in advance!

JR

You can use LAG():

Window Functions:

…returns value evaluated at the row that is offset rows before the
current row within the partition; if there is no such row, instead
return default (which must be of the same type as value). Both offset
and default are evaluated with respect to the current row. If omitted,
offset defaults to 1 and default to null.

with sample_data as (
select 1 as id,'a'::text vendor_id,'a'::text product_id, '2018-01-01' ::date as t1, '2018-04-18'::date as t2, 50 as cost, 200 as revenue, 36 as sales
union all
select 2 as id,'a':: text vendor_id,'b'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 55 as cost, 200 as revenue, 34 as sales
union all
select 3 as id,'a'::text vendor_id,'a'::text product_id, '2018-01-01'::date a s t1, '2018-04-17'::date as t2, 35 as cost, 150 as revenue, 25 as sales
union all
select 4 as id,'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 25 as cost, 140 as revenue, 23 as sales
union all
select 5 as id,'a'::text vendor_id,'a'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 16 as cost, 70 as revenue, 12 as sales
union all
select 6 as id,'a'::text vendor_id,'b'::text product_id, '2018-01-01 '::date as t1, '2018-04-16'::date as t2, 13 as cost, 65 as revenue, 11 as sales
)
select sd.*
, coalesce (cost-lag(cost) over (partition by vendor_id, product_id order by t2),cost) cost_new
, coalesce(revenue-lag(revenue) over (partition by vendor_id, product_id order by t2),revenue) revenue_new
, coalesce(sales-lag(sales) over (partition by vendor_id, product_id order by t2),sales) sales_new
from s ample_data sd
order by t2 desc

I collect data from some API sources through Python and add it to 2 tables in Postgres.

Then, I use this data to generate reports, join and group/filter the data. I add thousands of rows every day.

Cost, revenue and sales are always cumulative, This means that each data point comes from t1 of the product, and t2 is the time when the data is backtracked.

Therefore, the latest data pull will include all previous data until t1. t1, t2 are in Postgres There is no time zone time stamp. I am currently using Postgres 10.

Sample:

id, vendor_id, product_id, t1, t2, cost, revenue , sales
1, a, a, 2018-01-01, 2018-04-18, 50, 200, 34
2, a, b, 2018-05-01, 2018-04-18 , 10, 100, 10
3, a, c, 2018-01-02, 2018-04-18, 12, 100, 9
4, a, d, 2018-01-03, 2018 -04-18, 12, 100, 8
5, b, e, 2018-25-02, 2018-04-18, 12, 100, 7

6, a, a , 2018-01-01, 2018-04-17, 40, 200, 30
7, a, b, 2018-05-01, 2018-04-17, 0, 95, 8
8 , a, c, 2018-01-02, 2018-04-17, 10, 12, 5
9, a, d, 2018-01-03, 2018-04-17, 8, 90, 4< br />10, b, e, 2018-25-02, 2018-04-17, 9, 0-, 3

Cost and revenue come from two tables, I add them to vendor_id, product_id and t2.

Is there a way I can browse all the data and “shift” it and subtract it, so I don’t accumulate data, but based on time series data?

Should this be done before storing, or is it better to make the report?

For reference, currently if I want a report of two changes, I will do two subqueries, but it seems to go backwards instead of calculating the data in time series, just the interval required for aggregation.

p>

with report1 as (select ...),
report2 as (select ...)
select .. from report1 left outer join report2 on .. .

Thanks a lot in advance!

JR

You can use LAG():

Window Functions:

…returns value evaluated at the row that is offset rows before the
current row within the partition; if there is no such row, instead
return default (which must be of the same type as value). Both offset
and default are evaluated with respect to the current row. If omitted,
offset defaults to 1 and default to null.

< pre>with sample_data as (
select 1 as id,’a’::text vendor_id,’a’::text product_id, ‘2018-01-01’::date as t1, ‘2018-04-18 ‘::date as t2, 50 as cost, 200 as revenue, 36 as sales
union all
select 2 as id,’a’::text vendor_id,’b’::text product_id, ‘ 2018-01-01’::date as t1, ‘2018-04-18’::date as t2, 55 as cost, 200 as revenue, 34 as sales
union all
select 3 as id ,’a’::text vendor_id,’a’::text product_id, ‘2018-01-01’::date as t1, ‘2018-04-17’::date as t2, 35 as cost, 150 as revenue , 25 as sales
union all
select 4 as id,’a’::text vendor_id,’b’::text product_id, ‘2018-01-01’::date as t1, ‘2018 -04-17’::date as t2, 25 as cost, 140 as revenue, 23 as sales
union all
select 5 as id,’a’::text vendor_id,’a’:: text product_id, ‘2018-01-01’::date as t1, ‘2018-04-16’::date as t2, 16 as cost, 70 as revenue, 12 as sales
union all
select 6 as id,’a’::text vendor_id,’b’::text product_id, ‘2018-01-01’::date as t1, ‘2018-04-16’::date as t2, 13 as cost , 65 as revenue, 11 as sales
)
select sd.*
, coalesce(cost-lag(cost) over (partition by vendor_id, product_id order by t2),cost) cost_new< br />, coalesce(revenue-lag(revenue) over (partition by vendor_id, product_id order by t2),revenue) revenue_new
, coalesce(sales-lag(sales) over (partition by vendor_id, product_id order by t2 ),sales) sales_new
from sample_data sd
order by t2 desc

Leave a Comment

Your email address will not be published.