Compare the dates, compute the difference between them, postgres - database

I have a date column and a balance column for each user. Every time user makes a transaction, a new row gets added to this table. It could be that the user makes 15 transactions during the day, and no transaction at all during 5 days.
Like this one
date balance
2017-06-01 95.63
2017-06-01 97.13
2017-06-01 72.14
2017-06-06 45.04
2017-06-08 20.04
2017-06-09 10.63
2017-06-09 -29.37
2017-06-09 -51.35
2017-06-13 -107.55
2017-06-13 -101.35
2017-06-15 -157.55
2017-06-16 -159.55
2017-06-17 -161.55
The goal is to select the positive and negative transactions made during the same day, compute their average or min value and to consider it as one transaction.If the next day no transaction has been made, then the amount of the previous day should be used.
it means for each day in a month i should calculate an interest and it the balance has not been updated then the balance of the previous day should be used.
Hypothetically my table should look like
date balance
1/6/2017 72.14
6/2/2017 72.14
6/3/2017 72.14
6/4/2017 72.14
6/5/2017 72.14
6/6/2017 45.04
7/6/2017 45.04
8/6/2017 20.04
9/6/2017 -51.35
10/6/2017 -51.35
11/6/2017 -51.35
12/6/2017 -51.35
13/06/2017 -107.55
14/06/2017 -107.55
15/06/2017 -157.55
16/06/2017 -159.55
17/06/2017 -161.55
i have added those days that were missing and group the days that were duplicate.
Once I have this done, I can select the number of positive balance days, e.g. 8 days, compute the average positive balance, and multiply it by 0.4%.
8*58.8525*0.004=0.23
The same should be done with negative balance. but with a different interest rate number of negative balance days, e.g. 9 multiplied by average negative balance during those days and 8.49%.
9*-99.90555556*0.00849=-0.848
So my expected result is just to have these two columns
Neg Pos
-0.848 0.23
How can I do that it in postgres? The function OVERLAP does not really help since I need to specify the dates.
Besides i do not know how to
loop the days and to see if there is a duplicate.
See which days are missing and use the previous balance for each of these missing days.

please try this.. replace table with your table name
with cte as
(
Select "date" as date
,min(balance) as balance
,lead("date") over(order by "date") next_date
,Coalesce(ABS("date" - lead("date") over(order by "date")),1) date_diff
from table
group by "date"
),
cte2 as
(
Select date_diff*balance as tot_bal , date_diff
from cte
Where balance > 0
),
cte3 as
(
Select date_diff*balance as tot_bal , date_diff
from cte
Where balance < 0
)
Select (sum(cte2.tot_bal) / sum(cte2.date_diff) ) * 0.004 as pos
,(sum(cte3.tot_bal) / sum(cte3.date_diff) ) * 0.00849 as neg
from cte2
,cte3;

Related

SQL: How to count distinct for all hourly periods in a day

I have a table of hotel data like this:
Room_ID
Check_in_time
Check_out_time
123
2021-10-01 01:02:03
2021-10-01 02:03:04
I would like to do a count of how many rooms were were checked in during each hour throughout a day (even if the room was checked in for 1 minute during the hour it still counts), so an output that look like this:
Time period
Number of rooms
09:00-10:00
10
10:00-11:00
12
..
..
There are a couple of other 'where' conditions but this is the crux of the problem. I have so far managed to write a query that can count unique room ID by specifying the hourly window:
select count (distinct room_id)
from data
where check_out_time > 9am and check_in_time < 10am
But how do I do this for each of the 24 hourly windows without repeating the same query 24 times? Hopefully something that can be later adapted into half hour intervals, or even minutes. I'm using Sigma in case that matters. Thanks in advance!
In Snowflake, I'd leverage a DATE_TRUNC function. If your dataset is very large, this will likely perform much better than any of the BETWEEN type of filtering that the OP and other answers are using.
select date_trunc('hour',check_out_time) as check_out_hour
, count (distinct room_id) as cnt
from data
group by 1;
If you needed to parse it out by day and time, you could add that, as well:
select date_trunc('day',check_out_time) as check_out_day
, date_trunc('hour',check_out_time) as check_out_hour
, count (distinct room_id) as cnt
from data
group by 1,2;
For reference:
https://docs.snowflake.com/en/sql-reference/functions/date_trunc.html
You may try the following:
A recursive CTE is used to generate the possible hours 0-23 (we could have also select distinct hours from your existing dataset but i did not want to assume that every hour was possibly booked and this may be a less expensive operation for this case to get all possible hours). A left join was then used to determine hours rooms were booked before aggregating this and counting the number of bookings each hour.
WITH recursive hours(hr) as (
select 0 as hr
union all
select hr + 1 from hours where hr < 23
)
select
concat(h.hr,':00-',(h.hr+1),':00') as time_period,
COUNT(DISTINCT r.room_id) as no_rooms
from hours h
left join room_times r on (
CAST(r.check_in_time AS DATE) = CAST(r.check_out_time AS DATE) AND
h.hr BETWEEN DATE_PART(hour,r.Check_in_time) AND DATE_PART(hour,r.Check_out_time)
) OR
(
CAST(r.check_in_time AS DATE) < CAST(r.check_out_time AS DATE) AND
(
h.hr >= DATE_PART(hour, r.Check_in_time) OR
h.hr <= DATE_PART(hour,r.Check_out_time)
)
)
GROUP BY h.hr
order by h.hr
See working db fiddle (using sql server instead) with the same logic and additional data and outputs to assist verification here.
Sample Data:
INSERT INTO room_times
(Room_ID, Check_in_time, Check_out_time)
VALUES
('123', '2021-10-01 01:02:03', '2021-10-01 03:03:04'),
('124', '2021-10-01 15:02:03', '2021-10-02 01:03:04');
Outputs:
time_period
no_rooms
0:00-1:00
1
1:00-2:00
2
2:00-3:00
1
3:00-4:00
1
4:00-5:00
0
5:00-6:00
0
6:00-7:00
0
7:00-8:00
0
8:00-9:00
0
9:00-10:00
0
10:00-11:00
0
11:00-12:00
0
12:00-13:00
0
13:00-14:00
0
14:00-15:00
0
15:00-16:00
1
16:00-17:00
1
17:00-18:00
1
18:00-19:00
1
19:00-20:00
1
20:00-21:00
1
21:00-22:00
1
22:00-23:00
1
23:00-24:00
1
Let me know if this works for you.

Netezza SQL how to calculate daily compound interest with varying rates

I’m having trouble figuring out how to calculate the daily compound interest for an initial amount, over various rates periods, producing a new total that includes the interest amounts from each rate period. The challenge is that for each subsequent rate period you have to calculate the interest on the amount plus the previous interest!!! So it’s not a simple running total.
For example, using the following rates table.
rate from date rate to date rate
-------------- ------------ ----
2013-07-15 2013-09-30 3
2013-10-01 2013-12-31 4
2014-01-01 2014-03-31 3
Using an initial amount of $32,550.37, I have to traverse each rate period with an interest calculation, producing the final amount of $33,337.34.
rate from date rate to date rate daysx amount interest
-------------- ------------ ---- ----- ---------- --------
2013-07-15 2013-09-30 .03 78 32,550.37 209.34
2013-10-01 2013-12-31 .04 92 32,759.71 331.94
2014-01-01 2014-03-31 .03 90 33,091.65 245.69
Final Amount 33,337.34
For example, the initial amount of $32,550.37 has interest of $209.34 at 3%. For the second rate period, I add that interest to the amount, which is $32,759.71 and then calculate the interest on $32,759.71 at 4%. Etc.
I’m using Netezza which does not allow recursive SQL, so I have been trying to use windowed functions, but not with any success yet …
DROP TABLE TRATES;
CREATE TABLE TRATES (RATE_FROM_DATE DATE, RATE_TO_DATE DATE, RATE DECIMAL(10,2));
INSERT INTO TRATES VALUES ('2013-07-15','2013-09-30',.03);
INSERT INTO TRATES VALUES ('2013-10-01','2013-12-31',.04);
INSERT INTO TRATES VALUES ('2014-01-01','2014-03-31',.03);
SELECT TRATES.*
, DAYS_BETWEEN(RATE_FROM_DATE, RATE_TO_DATE)+1 AS DAYSX
, (AMOUNT * POW(1+(RATE)/365,(DAYS_BETWEEN(RATE_FROM_DATE, RATE_TO_DATE)+1)))) – AMOUNT
AS INTEREST
, FIRST_VALUE(AMOUNT) OVER(ORDER BY RATE_FROM_DATE)
*(POW(1+(RATE/100)/365,(DAYS_BETWEEN(RATE_FROM_DATE, RATE_TO_DATE)+1)))
AS NEW_AMOUNT
FROM TRATES
JOIN (SELECT 32550.37 AS AMOUNT) AS TPARMS ON 1=1
;
Any help would be greatly appreciated.
I’m using the fact that the interest rate is shifting is not dependent on which order the shifts occur, so 3 days at 3% followed but 4 days at 2% gives the same result as 4 days at 2% followed by 3 days at 3%.
Furthermore summing logarithms will allow you to multiply over rows:
https://blog.jooq.org/2018/09/21/how-to-write-a-multiplication-aggregate-function-in-sql/
In short: log(a1)+log(a2)+..+log(an) = log(a1*a2*..*an)
This is pretty close to a useful solution (and performs reasonably):
DROP TABLE TRATES if exists;
CREATE temp TABLE TRATES (RATE_FROM_DATE DATE, RATE_TO_DATE DATE, RATE DECIMAL(10,2));
INSERT INTO TRATES VALUES ('2013-07-15','2013-09-30',.03);
INSERT INTO TRATES VALUES ('2013-10-01','2013-12-31',.04);
INSERT INTO TRATES VALUES ('2014-01-01','2014-03-31',.03);
create temp table dates as
select '2010-01-01'::date -1+row_number() over (order by null) as Date
from ( select * from
_v_dual_dslice a cross join _v_dual_dslice b cross join _v_dual_dslice c cross join _v_dual_dslice d
limit 10000 ) x
;
SELECT AMOUNT*pow(10,sum(log(1+rate::double/365)))
FROM TRATES join dates
on date between RATE_FROM_DATE and RATE_TO_DATE
JOIN (SELECT 32550.37 AS AMOUNT) AS TPARMS ON 1=1
group by AMOUNT
I'm sure you can make it prettier with a bit of effort and even let it return the results on select days in the 260 day interval if needed

Populating a list of dates without a defined end date - SQL server

I have a list of accounts and their cost which changes every few days.
In this list I only have the start date every time the cost updates to a new one, but no column for the end date.
Meaning, I need to populate a list of dates when the end date for a specific account and cost, should be deduced as the start date of the same account with a new cost.
More or less like that:
Account start date cost
one 1/1/2016 100$
two 1/1/2016 150$
one 4/1/2016 200$
two 3/1/2016 200$
And the result I need would be:
Account date cost
one 1/1/2016 100$
one 2/1/2016 100$
one 3/1/2016 100$
one 4/1/2016 200$
two 1/1/2016 150$
two 2/1/2016 150$
two 3/1/2016 200$
For example, if the cost changed in the middle of the month, than the sample data will only hold two records (one per each unique combination of account-start date-cost), while the results will hold 30 records with the cost for each and every day of the month (15 for the first cost and 15 for the second one). The costs are a given, and no need to calculate them (inserted manually).
Note the result contains more records because the sample data shows only a start date and an updated cost for that account, as of that date. While the results show the cost for every day of the month.
Any ideas?
Solution is a bit long.
I added an extra date for test purposes:
DECLARE #t table(account varchar(10), startdate date, cost int)
INSERT #t
values
('one','1/1/2016',100),('two','1/1/2016',150),
('one','1/4/2016',200),('two','1/3/2016',200),
('two','1/6/2016',500) -- extra row
;WITH CTE as
( SELECT
row_number() over (partition by account order by startdate) rn,
*
FROM #t
),N(N)AS
(
SELECT 1 FROM(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))M(N)
),
tally(N) AS -- tally is limited to 1000 days
(
SELECT ROW_NUMBER()OVER(ORDER BY N.N) - 1 FROM N,N a,N b
),GROUPED as
(
SELECT
cte.account, cte.startdate, cte.cost, cte2.cost cost2, cte2.startdate enddate
FROM CTE
JOIN CTE CTE2
ON CTE.account = CTE2.account
and CTE.rn = CTE2.rn - 1
)
-- used DISTINCT to avoid overlapping dates
SELECT DISTINCT
CASE WHEN datediff(d, startdate,enddate) = N THEN cost2 ELSE cost END cost,
dateadd(d, N, startdate) startdate,
account
FROM grouped
JOIN tally
ON datediff(d, startdate,enddate) >= N
Result:
cost startdate account
100 2016-01-01 one
100 2016-01-02 one
100 2016-01-03 one
150 2016-01-01 two
150 2016-01-02 two
200 2016-01-03 two
200 2016-01-04 one
200 2016-01-04 two
200 2016-01-05 two
500 2016-01-06 two
Thank you #t-clausen.dk!
It didn't solve the problem completely, but did direct me in the correct way.
Eventually I used the LEAD function to generate an end date for every cost per account, and then I was able to populate a list of dates based on that idea.
Here's how I generate the end dates:
DECLARE #t table(account varchar(10), startdate date, cost int)
INSERT #t
values
('one','1/1/2016',100),('two','1/1/2016',150),
('one','1/4/2016',200),('two','1/3/2016',200),
('two','1/6/2016',500)
select account
,[startdate]
,DATEADD(DAY, -1, LEAD([Startdate], 1,'2100-01-01') OVER (PARTITION BY account ORDER BY [Startdate] ASC)) AS enddate
,cost
from #t
It returned the expected result:
account startdate enddate cost
one 2016-01-01 2016-01-03 100
one 2016-01-04 2099-12-31 200
two 2016-01-01 2016-01-02 150
two 2016-01-03 2016-01-05 200
two 2016-01-06 2099-12-31 500
Please note that I set the end date of current costs to be some date in the far future which means (for me) that they are currently active.

SQL Server: How to get a rolling sum over 3 days for different customers within same table

This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"
One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.

Get record based on year in oracle

I am creating a query to give number of days between two days based on year. Actually I have below type of date range
From Date: TO_DATE('01-Jun-2011','dd-MM-yyyy')
To Date: TO_DATE('31-Dec-2013','dd-MM-yyyy')
My Result should be:
Year Number of day
------------------------------
2011 XXX
2012 XXX
2013 XXX
I've tried below query
WITH all_dates AS
(SELECT start_date + LEVEL - 1 AS a_date
FROM
(SELECT TO_DATE ('21/03/2011', 'DD/MM/YYYY') AS start_date ,
TO_DATE ('25/06/2013', 'DD/MM/YYYY') AS end_date
FROM dual
)
CONNECT BY LEVEL <= end_date + 1 - start_date
)
SELECT TO_CHAR ( TRUNC (a_date, 'YEAR') , 'YYYY' ) AS YEAR,
COUNT (*) AS num_days
FROM all_dates
WHERE a_date - TRUNC (a_date, 'IW') < 7
GROUP BY TRUNC (a_date, 'YEAR')
ORDER BY TRUNC (a_date, 'YEAR') ;
I got exact output
Year Number of day
------------------------------
2011 286
2012 366
2013 176
My question is if i use connect by then query execution takes long time as i have millions of records in table and hence i don't want to use connect by clause
connect by clause is creating virtual rows against the particular record.
Any help or suggestion would be greatly appreciated.
From your vague expected results I think you want the number of records between those dates, not the number of days; but it's rather unclear. Since you refer to a table in the question I assume you want something related to the table data, not simply days between two dates which wouldn't depend on a table at all. (I have no idea what the connect by clause reference means though). This should give you that, if it is what you want:
select extract(year from date_field), count(*)
from t42
where date_field >= to_date('01-Jun-2011', 'DD-MON-YYYY')
and date_field < to_date('31-Dec-2013') + interval '1' day
group by extract(year from date_field)
order by extract(year from date_field);
The where clause is as you'd expect between two dates; I've assumed there might be times in your date field (i.e. not all at midnight) and that you want to count all records on the last date in your range. Then it's grouping and counting based on the year for each record.
SQL Fiddle.
If you want the number of days that have records within the range, then you can just vary the count slightly:
select extract(year from date_field), count(distinct trunc(date_field))
...
SQL Fiddle.
you can use the below function to reduce the number of virtual rows by considering only the years in between.You can check the SQLFIDDLE to check the performance.
First consider only the number of days between start date and the year end of that year or
End date if it is in same year
Then consider the years in between from next year of start date to the year before the end date year
Finally consider the number of days from start of end date year to end date
Hence instead of iterating for all the days between start date and end date we need to iterate only the years
WITH all_dates AS
(SELECT (TO_CHAR(START_DATE,'yyyy') + LEVEL - 1) YEARS_BETWEEN,start_date,end_date
FROM
(SELECT TO_DATE ('21/03/2011', 'DD/MM/YYYY') AS start_date ,
TO_DATE ('25/06/2013', 'DD/MM/YYYY') AS end_date
FROM dual
)
CONNECT BY LEVEL <= (TO_CHAR(end_date,'yyyy')) - (TO_CHAR(start_date,'yyyy')-1)
)
SELECT DECODE(TO_CHAR(END_DATE,'yyyy'),YEARS_BETWEEN,END_DATE
,to_date('31-12-'||years_between,'dd-mm-yyyy'))
- DECODE(TO_CHAR(START_DATE,'yyyy'),YEARS_BETWEEN,START_DATE
,to_date('01-01-'||years_between,'dd-mm-yyyy'))+1,years_between
FROM ALL_DATES;
In Oracle you can perform Addition and Substraction to dates like this...
SELECT
TO_DATE('31-Dec-2013','dd-MM-yyyy') - TO_DATE('01-Jun-2011','dd-MM-yyyy')
DAYS FROM DUAL;
it will return day difference between two dates....
select to_date(2011, 'yyyy'), to_date(2012, 'yyyy'), to_date(2013, 'yyyy')
from dual;
TO_DATE(2011,'Y TO_DATE(2012,'Y TO_DATE(2013,'Y
--------------- --------------- ---------------
01-MAY-11 01-MAY-12 01-MAY-13
select to_char(date_field,'yyyy'), count(*)
from your_table
where date_field between to_date('01-Jun-2011', 'DD-MON-YYYY')
and to_date('31-Dec-2013 23:59:59', 'DD-MON-YYYY hh24:mi:ss')
group by to_char(date_field,'yyyy')
order by to_char(date_field,'yyyy');

Resources