I am using the following formula to calculate the Pearson correlation in my data. Note: I am using a CASE WHEN to account for a divide by zero error. The code below represents solely the formula.
( COUNT(*) * SUM(X * Y) - SUM(X) * SUM(Y) )
/ ( SQRT(COUNT(*) * SUM(X * X) - SUM(X) * SUM(x)) * SQRT(COUNT(*) * SUM(Y* Y) - SUM(Y) * SUM(Y) ) )
Edit added query:
DROP TABLE IF EXISTS #test;
SELECT year
,product_id
,score_range
,reporting_year
/* used to manually calculate correlation in excel */
,COUNT(*) AS n_count
,COUNT(*) * SUM(1_x * 2_score) - SUM(1_x) * SUM(2_score) AS numerator
,SUM(1_x * 1_x) AS 1_sumprod
,SUM(1_x) AS 1_sum
,SUM(2_score * 2_score) AS 2_sumprod
,SUM(2_score) AS 2_sum
INTO #test
FROM #acct_details
GROUP BY year
,product_id
,score_range
,reporting_year
;
SELECT year
,product_id
,score_range
,reporting_year
,CASE
WHEN ( ( SQRT(n_count * 1_sumprod - 1_sum * 1_sum) * SQRT(n_count * 2_sumprod - 2_sum * 2_sum) ) ) = 0
THEN NULL
ELSE numerator / ( ( SQRT(n_count * 1_sumprod - 1_sum * 1_sum) * SQRT(n_count * 2_sumprod - 2_sum * 2_sum) ) )
END AS sql_corr
,(n_count * 1_sumprod - 1_sum * 1_sum) 1_denom
,( SQRT(n_count * 2_sumprod - 2_sum * 2_sum) ) AS 2_denom
FROM #test
ORDER BY year
,reporting_year
,score_range
;
The output of my data looks like the table below. Note that excel_corr is the correlation manually calculated in Excel, which is my expected output.
The column sql_corr is the result from my sql code above. The columns from count to the end represent the X and Y values that get plugged into the formula above. My problem is that the sql_corr does not match the output from manually calculating the correlation by groupings in Excel.
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| year | product_id | score_range | reporting_year | sql_corr | count | numerator | 1_sumprod | 1_sum | 2_sumprod | 2_sum | excel_corr |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 1-2 | 2016 | NULL | 1 | 0 | 0.000124 | -0.011155 | 195364 | 442 | #DIV/0! |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 3-4 | 2016 | NULL | 1272 | -0.0683 | 4.9E-11 | -0.000007 | 304648060 | 622434 | -0.02911 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 5-6 | 2016 | -0.06416 | 3913 | -11.845 | 2.89E-09 | -0.000459 | 1.089E+09 | 2063948 | -0.06391 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 7-8 | 2016 | 0.00573 | 2593 | 1.63663 | 2.27E-08 | -0.000975 | 848560006 | 1482872 | 0.00573 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 9-10 | 2016 | -0.02106 | 1420 | -3.2855 | 4.13E-08 | -0.00131 | 555096971 | 887587 | -0.02106 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 11-12 | 2016 | 0.05231 | 917 | 6.64768 | 1.06E-07 | -0.000987 | 413059274 | 615312 | 0.052438 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 13-14 | 2016 | 0.006704 | 359 | 0.5064 | 6.18E-07 | 0.000271 | 185781413 | 258205 | 0.006705 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 15-16 | 2016 | 0.017846 | 55 | 0.14095 | 3.79E-06 | 0.000349 | 31849498 | 41850 | 0.017839 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 17-18 | 2016 | NULL | 1 | 0 | 0 | 0 | 641601 | 801 | #DIV/0! |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
For example, in score_range 3-4 the sql_corr value is NULL but in excel the value is -0.02911. If we plug in the values manually into the formula -0.02911 is the correct result.
numerator
/ ( ( SQRT(n_count * 1_sumprod - 1_sum * 1_sum) * SQRT(n_count * 2_sumprod - 2_sum * 2_sum) ) )
In SQL Server the denominator is getting pushed to 0. When I calculate this manually in Excel the denominator is 2.344354. Why is my denominator being pushed to 0 in SQL Server when the same data results in a different calculation when done manually?
Edit
The first part of the denominator is being pushed to 0. ( ( SQRT(n_count * 1_sumprod - 1_sum * 1_sum). When the multiplication occurs the whole denominator gets pushed to 0 in SQL activating the CASE statement to return NULL. This is incorrect confirmed by manual calculation. The following represents the output from both parts of the denominator 0.000000 and 9394.0387480572. The actual value for the first part of the denominator via manual calculation is ~0.00025.
Edit
The value of (n_count * 1_sumprod - 1_sum * 1_sum) = 6.2279E-08 -- before taking the square root. However, SQL is pushing this part of the equation to 0.
I am using SQL Server 2016 v14.0.2037.2. I thought maybe my value was too small but it appears that values greater than 5E-18 should remain. This was confirmed in the documentation here.
Credit to TomPhillips here.
They problem you have is you have a mix of integers and floats. This causes confusions and conversions. Convert all your values to float to get the value you are expecting. This is what Excel does.
Even though I CAST the initial values going into #test from #acct_details as DECIMAL, I needed to explicitly CAST the values in #test to the formula all as DECIMAL also. The result kept getting rounded to 0 here: SUM(1_x * 1_x). CASTing the values explicitly resolved this by forcing the necessary precision.
DROP TABLE IF EXISTS #test;
SELECT year
,product_id
,score_range
,reporting_year
/* used to manually calculate correlation in excel */
,CAST(COUNT(*) AS DECIMAL(16,9)) AS n_count
,CAST( ( COUNT(*) * SUM(1_x * 2_score) - SUM(1_x) * SUM(2_score) ) AS DECIMAL(16,9)) AS numerator
,CAST(SUM(1_x * 1_x) AS DECIMAL(16,9)) AS 1_sumprod
,CAST(SUM(1_x) AS DECIMAL(16,9)) AS 1_sum
,CAST(SUM(2_score * 2_score) AS DECIMAL(16,9)) AS 2_sumprod
,CAST(SUM(2_score) AS DECIMAL(16,9)) AS 2_sum
INTO #test
FROM #acct_details
GROUP BY year
,product_id
,score_range
,reporting_year
;
Changing them to FLOAT also worked. Documentation here.
DROP TABLE IF EXISTS #test;
SELECT year
,product_id
,score_range
,reporting_year
/* used to manually calculate correlation in excel */
,CAST(COUNT(*) AS FLOAT) AS n_count
,CAST( ( COUNT(*) * SUM(1_x * 2_score) - SUM(1_x) * SUM(2_score) ) AS FLOAT) AS numerator
,CAST(SUM(1_x * 1_x) AS FLOAT) AS 1_sumprod
,CAST(SUM(1_x) AS FLOAT) AS 1_sum
,CAST(SUM(2_score * 2_score) AS FLOAT) AS 2_sumprod
,CAST(SUM(2_score) AS FLOAT) AS 2_sum
INTO #test
FROM #acct_details
GROUP BY year
,product_id
,score_range
,reporting_year
;
Related
I have two tables of different density data and I'd like to be able to join them but interpolate this values in the lower frequency table to fill in the gaps.
I have no idea how to approach this other than it's a lag/lead thing but the differences are irregular.
Here is my set up below:
CREATE TABLE #HighFreq
(MD INT NOT NULL,
LOSS float)
INSERT INTO #HighFreq
VALUES
(6710,0.5)
,(6711,0.6)
,(6712,0.6)
,(6713,0.5)
,(6714,0.5)
,(6715,0.4)
,(6716,0.9)
,(6717,0.9)
,(6718,0.9)
,(6719,1)
,(6720,0.8)
,(6721,0.9)
,(6722,0.7)
,(6723,0.7)
,(6724,0.7)
,(6725,0.7)
CREATE TABLE #LowFreq
(MD INT NOT NULL
,X FLOAT
,Y FLOAT)
INSERT INTO #LowFreq
VALUES
(6710,12,1000)
,(6711,8,1001)
,(6718,10,1007)
,(6724,8,1013)
,(6730,11,1028)
And I want my output to look like this:
Here is an approach using a recursive cte and window functions. The recusive cte generates the list of mds from values available in both tables. Then, the idea is to put adjacent "missing" #LowFreq records into groups, using gaps-and-island technique. You can then do the interpolation in the outer query, by projecting values between the first (and only) non-null value in the group and the next one.
with cte as (
select min(coalesce(h.md, l.md)) md, max(coalesce(h.md, l.md)) md_max
from #HighFreq h
full join #LowFreq l on l.md = h.md
union all
select md + 1, md_max from cte where md < md_max
)
select
md,
loss,
coalesce(x, min(x) over(partition by grp)
+ (min(lead_x) over(partition by grp) - min(x) over(partition by grp))
* (row_number() over(partition by grp order by md) - 1)
/ count(*) over(partition by grp)
) x,
coalesce(y, min(y) over(partition by grp)
+ (min(lead_y) over(partition by grp) - min(y) over(partition by grp))
* (row_number() over(partition by grp order by md) - 1)
/ count(*) over(partition by grp)
) y
from (
select
c.md,
h.loss,
l.x,
l.y,
sum(case when l.md is null then 0 else 1 end) over(order by c.md) grp,
lead(l.x) over(order by c.md) lead_x,
lead(l.y) over(order by c.md) lead_y
from cte c
left join #HighFreq h on h.md = c.md
left join #LowFreq l on l.md = c.md
) t
Demo on DB Fiddle:
md | loss | x | y
---: | ---: | ---------------: | ---------------:
6710 | 0.5 | 12 | 1000
6711 | 0.6 | 8 | 1001
6712 | 0.6 | 8.28571428571429 | 1001.85714285714
6713 | 0.5 | 8.57142857142857 | 1002.71428571429
6714 | 0.5 | 8.85714285714286 | 1003.57142857143
6715 | 0.4 | 9.14285714285714 | 1004.42857142857
6716 | 0.9 | 9.42857142857143 | 1005.28571428571
6717 | 0.9 | 9.71428571428571 | 1006.14285714286
6718 | 0.9 | 10 | 1007
6719 | 1 | 9.66666666666667 | 1008
6720 | 0.8 | 9.33333333333333 | 1009
6721 | 0.9 | 9 | 1010
6722 | 0.7 | 8.66666666666667 | 1011
6723 | 0.7 | 8.33333333333333 | 1012
6724 | 0.7 | 8 | 1013
6725 | 0.7 | 8.5 | 1015.5
6726 | null | 9 | 1018
6727 | null | 9.5 | 1020.5
6728 | null | 10 | 1023
6729 | null | 10.5 | 1025.5
6730 | null | 11 | 1028
I'm using SQL Server. I have a nice summary table like the one you see below. I want to populate (create) the Pct field for each proficiency level.
| MeasurementScale | Grade | ProficiencyLevel | PL_Count | Pct |
|------------------|-------|------------------|----------|-----|
| Mathematics | 6 | Did Not Meet | 40 | |
| Mathematics | 6 | Approaches | 86 | |
| Mathematics | 6 | Meets | 83 | |
| Mathematics | 6 | Masters | 42 | |
| Mathematics | 6 | Total | 251 | |
I basically want something like the following query, I just don't know how to write it.
SELECT SchoolName
,MeasurementScale
,Grade
,ProficiencyLevel
,PL_Count
,(PL_Count / (SELECT PL_Count FROM #PL_Summary1920 WHERE ProficiencyLevel = 'Total')) as Pct
FROM #PL_Summary1920
GROUP BY SchoolName
,MeasurementScale
,Grade
,ProficiencyLevel
,PL_Count
SELECT V1.*,
CASE WHEN V2.PL_COUNT = 0 THEN 0
ELSE V1.PL_Count * 1.0/ V2.PL_COUNT
END AS PCT
FROM (
SELECT SchoolName,
MeasurementScale,
Grade,
ProficiencyLevel,
SUM(PL_Count) AS PL_Count
FROM #PL_Summary1920 T1
GROUP BY SchoolName
,MeasurementScale
,Grade
,ProficiencyLevel
) V1
LEFT JOIN (
SELECT SchoolName,
MeasurementScale,
Grade,
SUM(TT.PL_Count) AS PL_COUNT
FROM #PL_Summary1920 TT
WHERE TT.ProficiencyLevel = 'Total'
GROUP BY SchoolName,
MeasurementScale,
Grade
) V2 ON V2.SchoolName = V1.SchoolName
AND V2.MeasurementScale = V1.MeasurementScale
AND V2.Grade = V1.Grade
Try this:
SELECT MeasurementScale
,Grade
,ProficiencyLevel
,PL_Count
,PL_Count * 1.0 / (SELECT PL_Count FROM #PL_Summary1920 WHERE ProficiencyLevel = 'Total') as Pct
FROM #PL_Summary1920
GROUP BY SchoolName
,MeasurementScale
,Grade
,ProficiencyLevel
,PL_Count
Doing the multiplication by 1.0 forces a type conversion to decimal, so your percentages display correctly. It's cleaner than doing a CAST or CONVERT.
I am looking for some advice or pointers on how to construct this. I have spent the last year self-learning SQL. I am at work and I only have access to the query interface in report builder. Which for me means, no procedures, no create tables and no IDE :(. So thats the limitations!
I am trying to reconstruct account balances. I have no intervening balances. I have the current balance and a table full of the transaction history
My current approach is to sum the transactions by posting week (Which I have done) in my CTE named
[SUMTRANSREF]
+--------------+------------+-----------+
| TNCY-SYS-REF | POSTING-WK | SUM-TRANS |
+--------------+------------+-----------+
| 1 | 47 | 37.95 |
| 1 | 46 | 37.95 |
| 1 | 45 | 37.95 |
| 2 | 47 | 50.00 |
| 2 | 46 | 25.00 |
| 2 | 45 | 25.00 |
+--------------+------------+-----------+
I then get the current balances in another CTE called
[CBAL]
+--------------+-------------+-----------+
| TNCY-SYS-REF | CUR-BALANCE | CURR-WEEK |
+--------------+-------------+-----------+
| 1 | 27.52 | 47 |
| 1 | 52.00 | 47 |
+--------------+-------------+-----------+
Now I am assuming I could create intervening CTEs to sum and then splice those altogether but is there a smarter (more automated) way?
Ideally my result should be
+--------------+-------------+----------+----------+
| TNCY-SYS-REF | CUR-BALANCE | BAL-WK46 | BAL-Wk45 |
+--------------+-------------+----------+----------+
| 1 | 27.52 | -10.43 | -48.38 |
| 2 | 52.00 | 2.00 | -48.00 |
+--------------+-------------+----------+----------+
I just am uncertain because each column requires the sum of intervening transactions
So BAL-WK46 is (CURR-BALANCE) - SUM(Transactions from 47)
So BAL-WK46 is (CURR-BALANCE) - SUM(Transactions 46+47)
So BAL-WK45 is (CURR-BALANCE) - SUM(Transactions 45+46+47)
and so on.
Normally I have an idea where to start but I am flummoxed by this one.
Any help you can give would be appreciated. Thank you
Here is some T-SQL that gets the result you require. Should be easy enough to play with to get what you want.
It makes use of Recursive CTE and a PIVOT
IF OBJECT_ID('Tempdb..#SUMTRANSREF') IS NOT NULL
DROP TABLE #SUMTRANSREF
IF OBJECT_ID('Tempdb..#CBAL') IS NOT NULL
DROP TABLE #CBAL
IF OBJECT_ID('Tempdb..#TEMP') IS NOT NULL
DROP TABLE #TEMP
CREATE TABLE #SUMTRANSREF
(
[TNCY-SYS-REF] int,
[POSTING-WK] int,
[SUM-TRANS] float
)
CREATE TABLE #CBAL
(
[TNCY-SYS-REF] int ,
[CUR-BALANCE] float , [CURR-WEEK] int
)
INSERT INTO #SUMTRANSREF
VALUES (1 ,47 , 37.95),
(1 ,46 , 37.95),
(1 ,45 , 37.95),
(2 ,47 , 50.00),
(2 ,46 , 25.00),
(2 ,45 , 25.00 )
INSERT INTO #CBAL
VALUES (1,27.52,47),(2,52.00,47);
WITH CBAL AS
(SELECT * FROM #CBAL),
SUMTRANSREF AS(SELECT * FROM #SUMTRANSREF),
RecursiveTotals([TNCY-SYS-REF],[CURR-WEEK],[CUR-BALANCE],RunningBalance)
AS
(
select C.[TNCY-SYS-REF], C.[CURR-WEEK],C.[CUR-BALANCE],C.[CUR-BALANCE] + S.RunningTotal RunningBalance from CBAL C
JOIN (select *,-SUM([SUM-TRANS]) OVER (PARTITION BY [TNCY-SYS-REF] ORDER BY [POSTING-WK] DESC) RunningTotal
from SUMTRANSREF) S
ON C.[CURR-WEEK]=S.[POSTING-WK] AND C.[TNCY-SYS-REF]=S.[TNCY-SYS-REF]
UNION ALL
select RT.[TNCY-SYS-REF], RT.[CURR-WEEK] -1 [CURR_WEEK],RT.[CUR-BALANCE],RT.[CUR-BALANCE] + S.RunningTotal RunningBalance FROM RecursiveTotals RT
JOIN (select *,-SUM([SUM-TRANS]) OVER (PARTITION BY [TNCY-SYS-REF] ORDER BY [POSTING-WK] DESC) RunningTotal
from #SUMTRANSREF) S ON RT.[TNCY-SYS-REF] = S.[TNCY-SYS-REF] AND RT.[CURR-WEEK]-1 = S.[POSTING-WK]
)
select [TNCY-SYS-REF],[CUR-BALANCE],[46] as 'BAL-WK46',[45] as 'BAL-WK45',[44] as 'BAL-WK44'
FROM (
select [TNCY-SYS-REF],[CUR-BALANCE],RunningBalance,BalanceWeek from (SELECT *,R.[CURR-WEEK]-1 'BalanceWeek' FROm RecursiveTotals R
) RT) AS SOURCETABLE
PIVOT
(
AVG(RunningBalance)
FOR BalanceWeek in ([46],[45],[44])
) as PVT
I have a table with user incomes and i wish to calculate their income tax percentage based on that income. The issue is that the tax rate is different for each bracket e.g.:
MinLimit| MaxLimit| TaxRate
0 | 14000 | 10.50
14001 | 48000 | 17.50
48001 | 70000 | 30.00
70001 | 1000000 | 33.00
So if the income of 1 person is 49,000 then they would be taxed as follows:
14000 * 0.1050 = 1470
34000 * 0.1750 = 5950 (34,000 is income between 14k -48k)
1000 * 0.30 = 300 (1000 is remaining income)
total = 1470 + 5950 + 300 = 7720
I am running on SQL Server 2017 Express. I have tried running a chained CASE-WHEN statement i.e.
CASE WHEN
THEN
WHEN
THEN
and so on...
but I can figure out how to add the logic of subtracting the remaining amount. Please find my code below.
SELECT 'emp_name' AS 'Director',
SUM(ABS([Transaction Amount])) AS 'INCOME',
CASE WHEN (SUM(ABS([Transaction Amount])) < 14000)
THEN ((SUM(ABS([Transaction Amount])) - 14000) * 0.1050)
WHEN (SUM(ABS([Transaction Amount])) > 14000 and (SUM(ABS([Transaction Amount])) < 48001))
THEN (((SUM(ABS([Transaction Amount])) - 14000) * 0.1050) - 48000) * 0.1750 end AS 'Income Tax'
FROM Transactions
EDIT 1:
Input Data:
Transaction Type| PAYEE | Transaction Amount
DEBIT | DEBIT | -184.00
CREDIT | CREDIT | 4000.00
...
Output Data:
Director | INCOME | Income Tax
emp_name | 45100.00| NULL
Please let me know where I am going wrong or if my thinking is incorrect.
A correlated subquery may be the simplest to read and understand:
declare #t table (MinLimitExclusive int, MaxLimitInclusive int, TaxRate decimal(5,2))
insert into #t(MinLimitExclusive,MaxLimitInclusive,TaxRate) values
(0 ,14000 , 10.50),
(14000,48000 , 17.50),
(48000,70000 , 30.00),
(70000,1000000, 33.00)
declare #transactions table (Income decimal(10,2))
insert into #transactions (Income) values (49000)
select
(Income - MinLimitExclusive) * TaxRate / 100 +
(select SUM((rates2.MaxLimitInclusive - rates2.MinLimitExclusive) * rates2.TaxRate / 100)
from #t rates2 where rates2.MaxLimitInclusive <= rates.MinLimitExclusive)
from
#transactions tr
inner join
#t rates
on
tr.Income > rates.MinLimitExclusive and tr.Income <= rates.MaxLimitInclusive
It's remarkably simplified when you realise that the only maths you need to do related to the actual income is related to the bracket it actually fits in - all of the lower rate brackets, by implication, were used in their entirety so you can compute those other taxes purely from the rates table.
I've changed your rates data slightly to make the computations straightforward and not need lots of +/-1 adjustments.
I suggest that you start with a MinLimit of 1 instead of 0. The rest of the calculation is straight forward:
declare #taxslabs table (minlimit int, maxlimit int, taxrate decimal(18, 2));
insert into #taxslabs values
(1, 14000, 10.50),
(14001, 48000, 17.50),
(48001, 70000, 30.00),
(70001, 1000000, 33.00);
select persons.*, taxslabs.*, taxableamount, taxableamount * taxrate / 100 as taxamount
from (values
(1, 49000),
(2, 70000),
(3, 70001)
) as persons(id, income)
cross join #taxslabs as taxslabs
cross apply (select case when income <= maxlimit then income else maxlimit end - minlimit + 1) as ca(taxableamount)
where minlimit <= income
You can place this query inside a subquery and use GROUP BY ... SUM() or SUM() OVER (PARTITION BY) to calculate the sum of taxes.
Sample output:
| id | income | minlimit | maxlimit | taxrate | taxableamount | taxamount |
|----|--------|----------|----------|---------|---------------|------------------|
| 1 | 49000 | 1 | 14000 | 10.50 | 14000 | 1470.000000 |
| 1 | 49000 | 14001 | 48000 | 17.50 | 34000 | 5950.000000 |
| 1 | 49000 | 48001 | 70000 | 30.00 | 1000 | 300.000000 |
| 2 | 70000 | 1 | 14000 | 10.50 | 14000 | 1470.000000 |
| 2 | 70000 | 14001 | 48000 | 17.50 | 34000 | 5950.000000 |
| 2 | 70000 | 48001 | 70000 | 30.00 | 22000 | 6600.000000 |
| 3 | 70001 | 1 | 14000 | 10.50 | 14000 | 1470.000000 |
| 3 | 70001 | 14001 | 48000 | 17.50 | 34000 | 5950.000000 |
| 3 | 70001 | 48001 | 70000 | 30.00 | 22000 | 6600.000000 |
| 3 | 70001 | 70001 | 1000000 | 33.00 | 1 | 0.330000 |
i think this relevant query using group by on transaction table and join to rate taxe table can be down excepted result :
CREATE TABLE #Transaction
(
tID int PRIMARY KEY,
tIdUser varchar(50),
Amount decimal(9,3)
);
CREATE TABLE #RefTaxe
(
pID int PRIMARY KEY,
minLimit int,
maxLImit int,
rate decimal(9,3)
);
INSERT INTO #Transaction
SELECT 1, 'User1', 1259.3
UNION
SELECT 2, 'User1', 10259.3
UNION
SELECT 3, 'User3', 30581.3
UNION
SELECT 4, 'User2', 75000.36
UNION
SELECT 5, 'User2', 15000.36
UNION
SELECT 6, 'User4', 45000.36
UNION
SELECT 7, 'User4', 5000.36
INSERT INTO #RefTaxe
select 1,0,14000,10.50
UNION
SELECT 2,14001,48000,17.50
UNION
SELECT 3,48001,70000,30.00
UNION
SELECT 4,70001,1000000,33.00
-- SELECT * FROM #Transaction
-- SELECT * FROM #RefTaxe
-- SELECT tIdUser,SUM(AMOUNT) as SumAmount, CAST(FLOOR(SUM(AMOUNT))as int) as SumAsInt FROM #Transaction GROUP BY tIdUser
/***/
-- Perform select
/***/
SELECT tIdUser, SumAmount as 'DetaxedAmount' ,SumAmount * (rate/100) as TaxOfAmount, SumAmount+ SumAmount * (rate/100) as TaxedAmount
FROM #RefTaxe RT
JOIN (
SELECT tIdUser,SUM(AMOUNT) as SumAmount, CAST(FLOOR(SUM(AMOUNT))as int) as SumAsInt
FROM #Transaction GROUP BY tIdUser
) AS GroupedTR ON RT.minLimit <= SumAsInt AND RT.maxLImit >= SumAsInt
/***/
DROP TABLE #Transaction
DROP TABLE #RefTaxe
Result output :
tIdUser DetaxedAmount TaxOfAmount TaxedAmount
User1 11518.600 1209.453000 12728.053
User2 90000.720 29700.237600 119700.958
User3 30581.300 5351.727500 35933.028
User4 50000.720 15000.216000 65000.936
This Query is calculating very exact results against tax rules define by Pakistan Government.[Thanks & Regards - Noman Ali].
Select
Salary,
Salary*12 as YearlySalary,
case
when salary * 12 Between 600001 and 1200000 then ((salary * 12 - 600000) / 100 * 2.5) / 12
when salary * 12 Between 1200001 and 2400000 then (15000 + (salary * 12 - 1200000) / 100 * 12.50) / 12
when salary * 12 Between 2400001 and 3600000 then (165000 + (salary * 12 - 2400000) / 100 * 20.0) / 12
when salary * 12 Between 3600001 and 6000000 then (405000 + (salary * 12 - 3600000) / 100 * 25.0 ) / 12
when salary * 12 Between 6000001 and 12000000 then (1005000 + (salary* 12 - 6000000) / 100 * 32.5 ) / 12
when salary * 12 > 12000001 then (2955000 + (salary * 12 - 12000000) / 100 * 35.0 ) / 12
else 0 end as IncomeTax
from Employees
I have a dataset where I need to calculate a value that for each row depends on the value in the previous row of the same column. Or a 1 initially when there is no previous row. I need to do this on different partitions.
The formula looks like this: factor = (previous factor or 1 if it does not exist) * (1 + div / nav)
This needs to be partitioned by Inst_id.
I would prefer to avoid a cursor. Maybe cte with recursion - but I cannot get my head around it - or another way?
I know this code does not work as I cannot reference the same column, but it is another way of showing what I'm trying to do:
SELECT Dato, Inst_id, nav, div
, (1 + div / nav ) * ISNULL(LAG(factor, 1) OVER (PARTITION BY Inst_id ORDER BY Date), 1) AS factor
FROM #tmp
So with my test data I need to get these results in the factor column below.
Please ignore rounding issues, as I calculated this in Excel:
date Inst_id nav div factor
11-04-2012 16 57.5700 5.7500 1.09987841
19-04-2013 16 102.8600 10.2500 1.20948130
29-04-2014 16 65.9300 16.7500 1.51675890
08-04-2013 29 111.2736 17.2500 1.15502333
10-04-2014 29 101.9650 16.3000 1.33966395
15-04-2015 29 109.5400 7.5000 1.43138825
27-04-2016 29 94.2500 0.4000 1.43746311
15-04-2015 34 159.1300 11.4000 1.07163954
27-04-2016 34 124.6100 17.6000 1.22299863
26-04-2017 34 139.7900 9.2000 1.30348784
01-04-2016 38 99.4600 0.1000 1.00100543
26-04-2017 38 102.9200 2.1000 1.02143014
Test data:
DECLARE #tmp TABLE(Dato DATE, Inst_id INT, nav DECIMAL(26,19), div DECIMAL(26,19), factor DECIMAL(26,19))
INSERT INTO #tmp (Dato, Inst_id, nav, div) VALUES
('2012-04-11', 16, 57.57, 5.75),
('2013-04-19', 16, 102.86, 10.25),
('2014-04-29', 16, 65.93, 16.75),
('2013-04-08', 29, 111.273577, 17.25),
('2014-04-10', 29, 101.964994, 16.3),
('2015-04-15', 29, 109.54, 7.5),
('2016-04-27', 29, 94.25, 0.4),
('2015-04-15', 34, 159.13, 11.4),
('2016-04-27', 34, 124.61, 17.6),
('2017-04-26', 34, 139.79, 9.2)
I'm on a Microsoft SQL Server Enterprise 2016 (and use SSMS 2016).
You can use (if DIV and NAV are always >0):
SELECT A.* , EXP(SUM( LOG(1+DIV/NAV) ) OVER (PARTITION BY INST_ID ORDER BY DATO) )AS FACT_NEW
FROM #tmp A
Actually what you need is an equivalent of aggregate function MULTIPLY() OVER ....
Using a log theorem: LOG(M*N) = LOG(M) + LOG (N) you can do it; for example:
DECLARE #X1 NUMERIC(10,4)=5
DECLARE #X2 NUMERIC(10,4)=7
SELECT #x1*#x2 AS S1, EXP(LOG(#X1)+LOG(#X2)) AS S2
Output:
+------------+---------+-------------------------+------------------------+--------+------------------+
| Dato | Inst_id | nav | div | factor | FACT_NEW |
+------------+---------+-------------------------+------------------------+--------+------------------+
| 2012-04-11 | 16 | 57.5700000000000000000 | 5.7500000000000000000 | NULL | 1.099878408893 |
| 2013-04-19 | 16 | 102.8600000000000000000 | 10.2500000000000000000 | NULL | 1.20948130303111 |
| 2014-04-29 | 16 | 65.9300000000000000000 | 16.7500000000000000000 | NULL | 1.51675889783963 |
| 2013-04-08 | 29 | 111.2735770000000000000 | 17.2500000000000000000 | NULL | 1.155023325977 |
| 2014-04-10 | 29 | 101.9649940000000000000 | 16.3000000000000000000 | NULL | 1.33966395090911 |
| 2015-04-15 | 29 | 109.5400000000000000000 | 7.5000000000000000000 | NULL | 1.43138824917236 |
| 2016-04-27 | 29 | 94.2500000000000000000 | 0.4000000000000000000 | NULL | 1.43746310646293 |
| 2015-04-15 | 34 | 159.1300000000000000000 | 11.4000000000000000000 | NULL | 1.071639539998 |
| 2016-04-27 | 34 | 124.6100000000000000000 | 17.6000000000000000000 | NULL | 1.22299862758278 |
| 2017-04-26 | 34 | 139.7900000000000000000 | 9.2000000000000000000 | NULL | 1.30348784264639 |
+------------+---------+-------------------------+------------------------+--------+------------------+
Using recursive CTE:
WITH DataSource AS
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY Inst_id ORDER BY Dato) AS [rowId]
FROM #tmp
),
RecursiveDataSource AS
(
SELECT *
,CAST((1 + div / nav ) * 1 AS DECIMAL(26,19)) as [factor_calculated]
FROM DataSource
WHERE [rowId] = 1
UNION ALL
SELECT A.*
,CAST((1 + A.div / A.nav ) * R.factor_calculated AS DECIMAL(26,19)) as [factor_calculated]
FROM RecursiveDataSource R
INNER JOIN DataSource A
ON r.[Inst_id] = A.[Inst_id]
AND R.[rowId] + 1 = A.[rowId]
)
SELECT *
FROM RecursiveDataSource
ORDER BY Inst_id, Dato;
I guess you are getting different values in Excel after row 3, because you are not partitioning by Inst_id there.