Netezza SQL how to calculate daily compound interest with varying rates - netezza

I’m having trouble figuring out how to calculate the daily compound interest for an initial amount, over various rates periods, producing a new total that includes the interest amounts from each rate period. The challenge is that for each subsequent rate period you have to calculate the interest on the amount plus the previous interest!!! So it’s not a simple running total.
For example, using the following rates table.
rate from date rate to date rate
-------------- ------------ ----
2013-07-15 2013-09-30 3
2013-10-01 2013-12-31 4
2014-01-01 2014-03-31 3
Using an initial amount of $32,550.37, I have to traverse each rate period with an interest calculation, producing the final amount of $33,337.34.
rate from date rate to date rate daysx amount interest
-------------- ------------ ---- ----- ---------- --------
2013-07-15 2013-09-30 .03 78 32,550.37 209.34
2013-10-01 2013-12-31 .04 92 32,759.71 331.94
2014-01-01 2014-03-31 .03 90 33,091.65 245.69
Final Amount 33,337.34
For example, the initial amount of $32,550.37 has interest of $209.34 at 3%. For the second rate period, I add that interest to the amount, which is $32,759.71 and then calculate the interest on $32,759.71 at 4%. Etc.
I’m using Netezza which does not allow recursive SQL, so I have been trying to use windowed functions, but not with any success yet …
DROP TABLE TRATES;
CREATE TABLE TRATES (RATE_FROM_DATE DATE, RATE_TO_DATE DATE, RATE DECIMAL(10,2));
INSERT INTO TRATES VALUES ('2013-07-15','2013-09-30',.03);
INSERT INTO TRATES VALUES ('2013-10-01','2013-12-31',.04);
INSERT INTO TRATES VALUES ('2014-01-01','2014-03-31',.03);
SELECT TRATES.*
, DAYS_BETWEEN(RATE_FROM_DATE, RATE_TO_DATE)+1 AS DAYSX
, (AMOUNT * POW(1+(RATE)/365,(DAYS_BETWEEN(RATE_FROM_DATE, RATE_TO_DATE)+1)))) – AMOUNT
AS INTEREST
, FIRST_VALUE(AMOUNT) OVER(ORDER BY RATE_FROM_DATE)
*(POW(1+(RATE/100)/365,(DAYS_BETWEEN(RATE_FROM_DATE, RATE_TO_DATE)+1)))
AS NEW_AMOUNT
FROM TRATES
JOIN (SELECT 32550.37 AS AMOUNT) AS TPARMS ON 1=1
;
Any help would be greatly appreciated.

I’m using the fact that the interest rate is shifting is not dependent on which order the shifts occur, so 3 days at 3% followed but 4 days at 2% gives the same result as 4 days at 2% followed by 3 days at 3%.
Furthermore summing logarithms will allow you to multiply over rows:
https://blog.jooq.org/2018/09/21/how-to-write-a-multiplication-aggregate-function-in-sql/
In short: log(a1)+log(a2)+..+log(an) = log(a1*a2*..*an)
This is pretty close to a useful solution (and performs reasonably):
DROP TABLE TRATES if exists;
CREATE temp TABLE TRATES (RATE_FROM_DATE DATE, RATE_TO_DATE DATE, RATE DECIMAL(10,2));
INSERT INTO TRATES VALUES ('2013-07-15','2013-09-30',.03);
INSERT INTO TRATES VALUES ('2013-10-01','2013-12-31',.04);
INSERT INTO TRATES VALUES ('2014-01-01','2014-03-31',.03);
create temp table dates as
select '2010-01-01'::date -1+row_number() over (order by null) as Date
from ( select * from
_v_dual_dslice a cross join _v_dual_dslice b cross join _v_dual_dslice c cross join _v_dual_dslice d
limit 10000 ) x
;
SELECT AMOUNT*pow(10,sum(log(1+rate::double/365)))
FROM TRATES join dates
on date between RATE_FROM_DATE and RATE_TO_DATE
JOIN (SELECT 32550.37 AS AMOUNT) AS TPARMS ON 1=1
group by AMOUNT
I'm sure you can make it prettier with a bit of effort and even let it return the results on select days in the 260 day interval if needed

Related

Optimize insert of 1B+ rows

I work in an organization that has over 75,000 employees. In our payroll system, each employee has 32 unique banks which store things like Sick Time, Vacation Time, Banked Overtime, etc.
Here are the existing tables
Employee
(
Employee_key INT IDENTITY(1,1)
Lastname,
Firstname
)
Employee Key | Lastname | Firstname
-----------------------------------
100 | Smith | John
Bank
(
Bank_key INT IDENTITY(1,1),
Bank_name VARCHAR(50)
)
Bank_key | Bank_name
---------------------
100 | VACATION
Employee_balance
(
Employee_key INT, --FK to Employee
Bank_key INT, --FK to Bank
Bank_balance NUMERIC(10,5) -- Aggregate value of bank including future dated entries
)
Employee_key | Bank_Key | Bank_Balance
--------------------------------------
100 | 100 | 0
Employee_balance_trans
(
Employee_key INT, --FK to Employee
Bank_key INT, --FK to Bank
Trans_dt DATE -- transaction date that affects the bank
Bank_delta NUMERIC(10,5)
)
Employee_key | Bank_key | Trans_dt | Balance_delta
--------------------------------------------------
100 | 100 | 20230701 | -8.0
100 | 100 | 20230801 | -8.0
100 | 100 | 20230901 | -8.0
100 | 100 | 20231001 | -8.0
100 | 100 | 20231101 | -8.0
This employee has 5 vacation days booked into the future, for a total of 40 hours. As of January 1, the employee had 40 hours in their vacation bank, but because the employee_balance table is net of all future dated entries, I have to do some SQL processing to get the value for a current date.
SELECT eb.employee_key,
eb.bank_key,
'2023-01-01',
eb.employee_balance - ISNULL(SUM(feb.balance_delta), 0)
FROM employee a
INNER JOIN employee_balance eb on eb.employee_key = a.employee_key
LEFT OUTER JOIN wfms.employee_balance_trans ebt ON ebt.balance_key = eb.balance_key
AND ebt.employee_key = eb.employee_key
AND ebt.trans_dt > '2023-01-01'
GROUP BY eb.employee_key, eb.balance_key, eb.employee_balance
Running this query using 2023-01-01 returns a bank value of 40 hours. Running the query on 2023-07-01 returns a value of 32 hours and so on. This query is fine for calculating a single employee balance. The problem starts when a manager of a department with 1000 employees wants to see a report showing the employee banks at the beginning and end of each month.
I created a new table as follows:
Employee_bank_history
(
employee_key INT, --FK to employee
bank_key INT, --FK to bank
bank_date DATE,
bank_balance NUMERIC (10,5) -- Contains the bank balance as of the bank date
)
The table has a unique clustered index consisting of employee_key, bank_key and bank_date. The table is also populated every evening with a date range from December 31 2021 to Current Date. The start date gets reset every year, so there will be a maximum of 730 days worth of data. This means that at the maximum date range of 730 days, there will be almost 2 billion rows. (75,000 employees X 32 banks X 730 days.)
Currently, I am loading 950 million rows, and the following INSERT statement takes 30-45 minutes.
DECLARE #StartDate DATE = DATEFROMPARTS(DATEPART (YY, GETDATE())-2, 12, 31)
DECLARE #EndDate DATE = GETDATE()
;
WITH cte_bank_dates AS
(
SELECT [date]
FROM dim_date
WHERE [date] BETWEEN #StartDate AND #EndDate
)
INSERT INTO employee_balance_history
SELECT de.employee_key,
deb.balance_key,
cte.[date],
deb.bank_balance - ISNULL(SUM(feb.bank_delta), 0)
FROM employee de
INNER JOIN employee_balance deb on deb.employee_key = de.employee_key
CROSS JOIN cte_bank_dates cte
LEFT OUTER JOIN employee_balance_trans feb ON feb.balance_key = deb.balance_key
AND feb.employee_key = deb.employee_key
AND feb.bank_date > cte.[date]
GROUP BY de.employee_key, deb.balance_key, cte.[date], deb.bank_balance
OPTION (MAXRECURSION 0)
I use the CTE to get only the dates in the correct range. I need to have each date in the range, so that I know which future dates to exclude from the aggregate option. The resulting query to get bank balances as of a given date is blazing fast.
Today, I had my hands slapped and was told that the CROSS JOIN to the CTE was not needed and to optimize the query because it was slowing everything else down when it runs.
Leaving aside the fact that it will run overnight once in production, I'm left to wonder if there's a better way to populate this table for every employee, every bank and every date. The number of rows is unavoidable, as is the calculation to strip out future dated transactions from the employee bank balance.
Does anyone have any idea how I might make this faster, and less resource intensive on the server?

Need to generate rows with missing data in a large dataset - SQL

We are comparing values between months over multiple years. As time moves on the number of years and months in the dataset increases. We are only interested in months where there were values for every year, i.e. a full set.
Consider the following example for 1 month (1) over 3 years (1,2,3) and two activities (101, 102)
Dataset:
Activity Month year Count
------- ---- ------ ------
101 1 1 2
101 1 2 3
101 1 3 1
102 1 1 1
102 1 2 1
In the example above only activity 101 will come into consideration as it satisfies the condition that there must be a count for the activity for month 1 IN year 1, 2 and 3.
Activity 102 doesn't qualify for further analysis as it has no record for year 3.
I would like to generate a record with which I can then evaluate this. The record will effectively generate the new record with the missing row (in this case 102, 1, 3 , 0) to complete the dataset
Activity Month year Count
------- ---- ------ ------
102 1 3 0
We find the problem difficult as the data keeps in growing, the number of activities keep expanding and it is a combination of activity, year and month that need to be evaluated.
An elegant solution will be appreciated.
As I mention in my comment, presumably you have both an Activity table and some kind of Calendar table with details of your activities and the years in your system. As such you can therefore do a CROSS JOIN between these 2 objects and then LEFT JOIN to your table to get the data set you want:
--Create sample objects/data
CREATE TABLE dbo.Activity (Activity int); --Obviously your table has more columns
INSERT INTO dbo.Activity (Activity)
VALUES (101),(102);
GO
CREATE TABLE dbo.Calendar (Year int,
Month int);--Likely your table has more columns
INSERT INTO dbo.Calendar (Year, Month)
VALUES(1,1),
(2,1),
(3,1);
GO
CREATE TABLE dbo.YourTable (Activity int,
Year int,
Month int,
[Count] int);
INSERT INTO dbo.YourTable (Activity,Month, Year, [Count])
VALUES(101,1,1,2),
(101,1,2,3),
(101,1,3,1),
(102,1,1,1),
(102,1,2,1);
GO
--Solution
SELECT A.Activity,
C.Month,
C.Year,
ISNULL(YT.[Count],0) AS [Count]
FROM dbo.Activity A
CROSS JOIN dbo.Calendar C
LEFT JOIN dbo.YourTable YT ON A.Activity = YT.Activity
AND C.[Year] = YT.[Year]
AND C.[Month] = YT.[Month]
WHERE C.Month = 1; --not sure if this is needed
If you don't have an Activity and Calendar table (I suggest, however, you should), then you can use subqueries with a DISTINCT, but note this will be far from performant with large data sets:
SELECT A.Activity,
C.Month,
C.Year,
ISNULL(YT.[Count],0) AS [Count]
FROM (SELECT DISTINCT Activity FROM dbo.YourTable) A
CROSS JOIN (SELECT DISTINCT Year, Month FROM dbo.YourTable) C
LEFT JOIN dbo.YourTable YT ON A.Activity = YT.Activity
AND C.[Year] = YT.[Year]
AND C.[Month] = YT.[Month]
WHERE C.Month = 1; --not sure if this is needed

Compare the dates, compute the difference between them, postgres

I have a date column and a balance column for each user. Every time user makes a transaction, a new row gets added to this table. It could be that the user makes 15 transactions during the day, and no transaction at all during 5 days.
Like this one
date balance
2017-06-01 95.63
2017-06-01 97.13
2017-06-01 72.14
2017-06-06 45.04
2017-06-08 20.04
2017-06-09 10.63
2017-06-09 -29.37
2017-06-09 -51.35
2017-06-13 -107.55
2017-06-13 -101.35
2017-06-15 -157.55
2017-06-16 -159.55
2017-06-17 -161.55
The goal is to select the positive and negative transactions made during the same day, compute their average or min value and to consider it as one transaction.If the next day no transaction has been made, then the amount of the previous day should be used.
it means for each day in a month i should calculate an interest and it the balance has not been updated then the balance of the previous day should be used.
Hypothetically my table should look like
date balance
1/6/2017 72.14
6/2/2017 72.14
6/3/2017 72.14
6/4/2017 72.14
6/5/2017 72.14
6/6/2017 45.04
7/6/2017 45.04
8/6/2017 20.04
9/6/2017 -51.35
10/6/2017 -51.35
11/6/2017 -51.35
12/6/2017 -51.35
13/06/2017 -107.55
14/06/2017 -107.55
15/06/2017 -157.55
16/06/2017 -159.55
17/06/2017 -161.55
i have added those days that were missing and group the days that were duplicate.
Once I have this done, I can select the number of positive balance days, e.g. 8 days, compute the average positive balance, and multiply it by 0.4%.
8*58.8525*0.004=0.23
The same should be done with negative balance. but with a different interest rate number of negative balance days, e.g. 9 multiplied by average negative balance during those days and 8.49%.
9*-99.90555556*0.00849=-0.848
So my expected result is just to have these two columns
Neg Pos
-0.848 0.23
How can I do that it in postgres? The function OVERLAP does not really help since I need to specify the dates.
Besides i do not know how to
loop the days and to see if there is a duplicate.
See which days are missing and use the previous balance for each of these missing days.
please try this.. replace table with your table name
with cte as
(
Select "date" as date
,min(balance) as balance
,lead("date") over(order by "date") next_date
,Coalesce(ABS("date" - lead("date") over(order by "date")),1) date_diff
from table
group by "date"
),
cte2 as
(
Select date_diff*balance as tot_bal , date_diff
from cte
Where balance > 0
),
cte3 as
(
Select date_diff*balance as tot_bal , date_diff
from cte
Where balance < 0
)
Select (sum(cte2.tot_bal) / sum(cte2.date_diff) ) * 0.004 as pos
,(sum(cte3.tot_bal) / sum(cte3.date_diff) ) * 0.00849 as neg
from cte2
,cte3;

DAX Calculated column based on two columns from other table

I need to write a DAX statement which is somewhat complex from a conceptual/logical standpoint- so this might be hard to explain.
I have two tables.
On the first table (shown below) I have a list of numeric values (Wages). For each value I have a corresponding date range. I also have EmployeeID and FunctionID. The purpose of this table is to keep track of the hourly Wages paid to employees performing specific functions during specific date ranges. Each Function has it's own Wage on the Wage table, BUT each employee might get paid a different Wage for the same Function ( there is also a dimension for functions and employees ).
'Wages'
Wage StartDate EndDate EmployeeID FunctionID
20 1/1/2016 1/30/2016 3456 20
15 1/15/2016 2/12/2016 3456 22
27.5 1/20/2016 2/20/2016 7890 20
20 1/21/2016 2/10/2016 1234 19
On 'Table 2' I have a record for every day that an Employee worked a certain Function. Remember, Table 1 contains the Wage information for every function.
'Table 2'
Date EmployeeID FunctionID DailyWage
1/1/2016 1234 $20 =CALCULATE( SUMX( ??? ) )
1/2/2016 1234 $20 =CALCULATE( SUMX( ??? ) )
1/3/2016 1234 $22 see below
1/4/2016 1234 $22
1/1/2016 4567 $27
1/2/2016 4567 $27
1/3/2016 4567 $27
(Note that wages can change over time)
What I'm trying to do is create a Calculated Column on 'Table 2' called 'DailyWage'. I want every row on 'Table 2' to tell me how much the EmployeeID was paid for the full day (assuming an 8 hour workday).
I'm really struggling with the logic steps, so I'm not sure what the best way to do this calculation is...
To make things worse, an EmployeeID might get paid a different Wage for the same Function on a different Date. They might start out at one wage working function X and then generally, their wage should go up a few months in the future... That means that if I try to concatenate the EmployeeID and the FunctionID, I won't be able to connect the tables on the concatenated value because neither table will have unique values.
So in other words, if we CONCATENATE the EmployeeID and FunctionID into EmpFunID, we need to take the EmpFunID + the date for the current row and then say "take the EmpFunID in the current row, plus the date for the current row and then return the value from the Wage column on the Wages table that has the same EmpFunID AND has a StartDate less that the CurrentRowDate AND has an EndDate greater than the CurrentRowDate
HERE IS WHAT I HAVE SO FAR:
Step 1 = Filter 'Wages' table so that StartDate < CurrentRowDate
Step 2 = Filter 'Wages' table so that EndDate > CurrentRowDate
Step 3 = LOOKUPVALUE( 'Wages'[Wage], 'Wages'[EmpFunID], Table2[EmpFunID])
Now I just need that converted into a DAX function.
Not sure if got it totally right, but maybe something similar? If you put this into Table2 as a calculated column, it will transform the current row context of the Table2 into a filter context.
So SUMX will use the current row's data from Table2, and will do a sum on a filtered version of the wages table: wages table will be filtered by using the current date, employeeid and functionid from Table2, and for each row in the Table2 itt will only sum those wages, which are belong to the current row.
CALCULATE(
SUMX(
FILTER(
'Wages',
'Wages'[StartDate] >= 'Table2'[Date],
'Wages'[EndDate] <= 'Table2'[Date],
'Wages'[EmployeeId] = 'Table2'[EmployeeId],
'Wages'[FunctionId] = 'Table2'[FunctionId]
),
'Wages'[Wage]
)

Populating a list of dates without a defined end date - SQL server

I have a list of accounts and their cost which changes every few days.
In this list I only have the start date every time the cost updates to a new one, but no column for the end date.
Meaning, I need to populate a list of dates when the end date for a specific account and cost, should be deduced as the start date of the same account with a new cost.
More or less like that:
Account start date cost
one 1/1/2016 100$
two 1/1/2016 150$
one 4/1/2016 200$
two 3/1/2016 200$
And the result I need would be:
Account date cost
one 1/1/2016 100$
one 2/1/2016 100$
one 3/1/2016 100$
one 4/1/2016 200$
two 1/1/2016 150$
two 2/1/2016 150$
two 3/1/2016 200$
For example, if the cost changed in the middle of the month, than the sample data will only hold two records (one per each unique combination of account-start date-cost), while the results will hold 30 records with the cost for each and every day of the month (15 for the first cost and 15 for the second one). The costs are a given, and no need to calculate them (inserted manually).
Note the result contains more records because the sample data shows only a start date and an updated cost for that account, as of that date. While the results show the cost for every day of the month.
Any ideas?
Solution is a bit long.
I added an extra date for test purposes:
DECLARE #t table(account varchar(10), startdate date, cost int)
INSERT #t
values
('one','1/1/2016',100),('two','1/1/2016',150),
('one','1/4/2016',200),('two','1/3/2016',200),
('two','1/6/2016',500) -- extra row
;WITH CTE as
( SELECT
row_number() over (partition by account order by startdate) rn,
*
FROM #t
),N(N)AS
(
SELECT 1 FROM(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))M(N)
),
tally(N) AS -- tally is limited to 1000 days
(
SELECT ROW_NUMBER()OVER(ORDER BY N.N) - 1 FROM N,N a,N b
),GROUPED as
(
SELECT
cte.account, cte.startdate, cte.cost, cte2.cost cost2, cte2.startdate enddate
FROM CTE
JOIN CTE CTE2
ON CTE.account = CTE2.account
and CTE.rn = CTE2.rn - 1
)
-- used DISTINCT to avoid overlapping dates
SELECT DISTINCT
CASE WHEN datediff(d, startdate,enddate) = N THEN cost2 ELSE cost END cost,
dateadd(d, N, startdate) startdate,
account
FROM grouped
JOIN tally
ON datediff(d, startdate,enddate) >= N
Result:
cost startdate account
100 2016-01-01 one
100 2016-01-02 one
100 2016-01-03 one
150 2016-01-01 two
150 2016-01-02 two
200 2016-01-03 two
200 2016-01-04 one
200 2016-01-04 two
200 2016-01-05 two
500 2016-01-06 two
Thank you #t-clausen.dk!
It didn't solve the problem completely, but did direct me in the correct way.
Eventually I used the LEAD function to generate an end date for every cost per account, and then I was able to populate a list of dates based on that idea.
Here's how I generate the end dates:
DECLARE #t table(account varchar(10), startdate date, cost int)
INSERT #t
values
('one','1/1/2016',100),('two','1/1/2016',150),
('one','1/4/2016',200),('two','1/3/2016',200),
('two','1/6/2016',500)
select account
,[startdate]
,DATEADD(DAY, -1, LEAD([Startdate], 1,'2100-01-01') OVER (PARTITION BY account ORDER BY [Startdate] ASC)) AS enddate
,cost
from #t
It returned the expected result:
account startdate enddate cost
one 2016-01-01 2016-01-03 100
one 2016-01-04 2099-12-31 200
two 2016-01-01 2016-01-02 150
two 2016-01-03 2016-01-05 200
two 2016-01-06 2099-12-31 500
Please note that I set the end date of current costs to be some date in the far future which means (for me) that they are currently active.

Resources