SQL Server - apply to case when to multiple columns - sql-server

I have a number of columns that contain values between 0 and 4, like so:
ID Q1 Q2 Q3 Q4 Q5 Q6 ... Q30
0001 4 0 3 1 0 4 2
0002 0 2 1 2 0 3 1
0003 4 2 3 0 3 0 4
0004 1 4 2 4 1 1 3
I need to transform these values so that 4=0, 3-25, 2=50, 1=75 and 0=100.
So, in the transformed version my first row would show the following:
ID Q1 Q2 Q3 Q4 Q5 Q6 ... Q30
0001 0 100 25 75 100 0 50
If it was just for one column I would use a case statement:
case
when Q1=4 then 0
when Q1=3 then 25
when Q1=2 then 50
when Q1=1 then 75
when Q1=0 then 100
end Q1
Can I apply this to a range of columns instead of doing a separate case statement for each column?
Or is there a more efficient way of achieving the same outcome?

write the condition like this
is more efficient,
because now is just an aritmetical operation,
and is better then a case with 5 check
SELECT 100 - (q1 * 25) AS Q1

update yourtable
set Q1 = 100 - Q1 * 25,
Q2 = 100 - Q2 * 25,
Q3 = 100 - Q3 * 25,
Q4 = 100 - Q4 * 25

Think about functions, something like this:
CREATE FUNCTION COL_EVAL(#Q1 : int) RETURNS INT AS
BEGIN
RETURN case
when #Q1=4 then 0
when #Q1=3 then 25
when #Q1=2 then 50
when #Q1=1 then 75
when #Q1=0 then 100
end Q1
END
And after all use it as:
SELECT COL_EVAL(Q1) as Q1, COL_EVAL(Q2)....
p.s. have no server to check syntax correctness...

Related

Calculating a column in SQL using the column's own output as input

I have problem that I find very hard to solve:
I need to calculate a column R_t in SQL where for each row, the sum of the "previous" calculated values SUM(R_t-1) is required as input. The calculation is done grouped over a ProjectID column. I have no clue how to proceed.
The formula for the calculation I am trying to achieve is R_t = ([Contract value]t - SUM(R{t-1})) / [Remaining Hours]_t * [HoursRegistered]t where "t" denotes time and SUM(R{t-1}) is the sum of R_t from t = 0 to t-1.
Time is always consecutive and always begin in t = 0. But number of time periods may differ across [ProjectID], i.e. one project having t = {0,1,2} and another t = {0,1,2,3,4,5}. The time period will never "jump" from 5 to 7
The expected output (using the data from below is) for ProjectID 101 is
R_0 = (500,000 - 0) / 500 * 65 = 65,000
R_1 = (500,000 - (65,000)) / 435 * 100 = 100,000
R_2 = (500,000 - (65,000 + 100,000)) / 335 * 85 = 85,000
R_3 = (500,000 - (65,000 + 100,000 + 85,000)) / 250 * 69 = 69,000
etc...
This calculation is done for each ProjectID.
My question is how to formulate this in a SQL query? My first thought was to create a recursive CTE, but I am actually not sure it is the right way proceed. Recursive CTE is (from my understanding) made for handling more of hierarchical like structure, which this isn't really.
My other thought was to calculate the SUM(R_t-1) using windowed functions, ie SUM OVER (PARITION BY ORDER BY) with a LAG, but the recursiveness really gives me trouble and I run my head against the wall when I am trying.
Below a query for creating the input data
CREATE TABLE [dbo].[InputForRecursiveCalculation]
(
[Time] int NULL,
ProjectID [int],
ContractValue float,
ContractHours float,
HoursRegistered float,
RemainingHours float
)
GO
INSERT INTO [dbo].[InputForRecursiveCalculation]
(
[Time]
,[ProjectID]
,[ContractValue]
,[ContractHours]
,[HoursRegistered]
,[RemainingHours]
)
VALUES
(0,101,500000,500,65,500),
(1,101,500000,500,100,435),
(2,101,500000,500,85,335),
(3,101,500000,500,69,250),
(4,101,450000,650,100,331),
(5,101,450000,650,80,231),
(6,101,450000,650,90,151),
(7,101,450000,650,45,61),
(8,101,450000,650,16,16),
(0,110,120000,90,10,90),
(1,110,120000,90,10,80),
(2,110,130000,90,10,70),
(3,110,130000,90,10,60),
(4,110,130000,90,10,50),
(5,110,130000,90,10,40),
(6,110,130000,90,10,30),
(7,110,130000,90,10,20),
(8,110,130000,90,10,10)
GO
For those of you who dare downloading something from a complete stranger, I have created an Excel file demonstrating the calculation (please download the file as you will not be to see the actual formula in the HTML representation shown when first clicking the link):
https://www.dropbox.com/s/3rxz72lbvooyc4y/Calculation%20example.xlsx?dl=0
Best regards,
Victor
I think it will be usefull for you. There is additional column SumR that stands for sumarry of previest rows (for ProjectID)
;with recu as
(
select
Time,
ProjectId,
ContractValue,
ContractHours,
HoursRegistered,
RemainingHours,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as R,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as SumR
from
InputForRecursiveCalculation
where
Time=0
union all
select
input.Time,
input.ProjectId,
input.ContractValue,
input.ContractHours,
input.HoursRegistered,
input.RemainingHours,
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours as numeric(15,0)),
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours + prev.SumR as numeric(15,0))
from
recu prev
inner join
InputForRecursiveCalculation input
on input.ProjectId = prev.ProjectId
and input.Time = prev.Time + 1
)
select
*
from
recu
order by
ProjectID,
Time
RESULTS:
Time ProjectId ContractValue ContractHours HoursRegistered RemainingHours R SumR
----------- ----------- ---------------------- ---------------------- ---------------------- ---------------------- --------------------------------------- ---------------------------------------
0 101 500000 500 65 500 65000 65000
1 101 500000 500 100 435 100000 165000
2 101 500000 500 85 335 85000 250000
3 101 500000 500 69 250 69000 319000
4 101 450000 650 100 331 39577 358577
5 101 450000 650 80 231 31662 390239
6 101 450000 650 90 151 35619 425858
7 101 450000 650 45 61 17810 443668
8 101 450000 650 16 16 6332 450000
0 110 120000 90 10 90 13333 13333
1 110 120000 90 10 80 13333 26666
2 110 130000 90 10 70 14762 41428
3 110 130000 90 10 60 14762 56190
4 110 130000 90 10 50 14762 70952
5 110 130000 90 10 40 14762 85714
6 110 130000 90 10 30 14762 100476
7 110 130000 90 10 20 14762 115238
8 110 130000 90 10 10 14762 130000

Working with an upper-triangular array in SAS (challenge +2 Points)

I'm looking to improve my code efficiency by turning my code into arrays and loops. The data i'm working with starts off like this:
ID Mapping Asset Fixed Performing Payment 2017 Payment2018 Payment2019 Payment2020
1 Loan1 1 1 1 90 30 30 30
2 Loan1 1 1 0 80 20 40 20
3 Loan1 1 0 1 60 40 10 10
4 Loan1 1 0 0 120 60 30 30
5 Loan2 ... ... ... ... ... ... ...
So For each ID (essentially the data sorted by Mapping, Asset, Fixed and then Performing) I'm looking to build a profile for the Payment Scheme.
The Payment Vector for the first ID looks like this:
PaymentVector1 PaymentVector2 PaymentVector3 PaymentVector4
1 0.33 0.33 0.33
It is represented by the formula
PaymentVector(I)=Payment(I)/Payment(1)
The above is fine to create in an array, example code can be given if you wish.
Next, under the assumption that every payment made is replaced i.e. when 30 is paid in 2018, it must be replaced, and so on.
I'm looking to make a profile that shows the outflows (and for illustration, but not required in code, in brackets inflows) for the movement of the payments as such - For ID=1:
Payment2017 Payment2018 Payment2019 Payment2020
17 (+90) -30 -30 -30
18 N/A (+30) -10 -10
19 N/A N/A (+40) -13.3
20 N/A N/A N/A (+53.3)
so if you're looking forwards, the rows can be thought of what year it is and the columns representing what years are coming up.
Hence, in year 2019, looking at what is to be paid in 2017 and 2018 is N/A because those payments are in the past / cannot be paid now.
As for in year 2018, looking at what has to be paid in 2019, you have to pay one-third of the money you have now, so -10.
I've been working to turn this dataset row by row into the array but there surely has to be a quicker way using an array:
The Code I've used so far looks like:
Data Want;
Set Have;
Array Vintage(2017:2020) Vintage2017-Vintage2020;
Array PaymentSchedule(2017:2020) PaymentSchedule2017-PaymentSchedule2020;
Array PaymentVector(2017:2020) PaymentVector2017-PaymentVector2020;
Array PaymentVolume(2017:2020) PaymentVolume2017-PaymentVolume2020;
do i=1 to 4;
PaymentVector(i)=PaymentSchedule(i)/PaymentSchedule(1);
end;
I'll add code tomorrow... but the code doesn't work regardless.
data have;
input
ID Mapping $ Asset Fixed Performing Payment2017 Payment2018 Payment2019 Payment2020; datalines;
1 Loan1 1 1 1 90 30 30 30
2 Loan1 1 1 0 80 20 40 20
3 Loan1 1 0 1 60 40 10 10
4 Loan1 1 0 0 120 60 30 30
data want(keep=id payment: fraction:);
set have;
array p payment:;
array fraction(4); * track constant fraction determined at start of profile;
array out(4); * track outlay for ith iteration;
* compute constant (over iterations) fraction for row;
do i = dim(p) to 1 by -1;
fraction(i) = p(i) / p(1);
end;
* reset to missing to allow for sum statement, which is <variable> + <expression>;
call missing(of out(*));
out(1) = p(1);
do iter = 1 to 4;
p(iter) = out(iter);
do i = iter+1 to dim(p);
p(i) = -fraction(i) * p(iter);
out(i) + (-p(i)); * <--- compute next iteration outlay with ye olde sum statement ;
end;
output;
p(iter) = .;
end;
format fract: best4. payment: 7.2;
run;
You've indexed your arrays with 2017:2020 but then try and use them using the 1 to 4 index. That won't work, you need to be consistent.
Array PaymentSchedule(2017:2020) PaymentSchedule2017-PaymentSchedule2020;
Array PaymentVector(2017:2020) PaymentVector2017-PaymentVector2020;
do i=2017 to 2020;
PaymentVector(i)=PaymentSchedule(i)/PaymentSchedule(2017);
end;

Array row calculations

I have the following table:
DATA:
Lines <- " ID MeasureX MeasureY x1 x2 x3 x4 x5
1 1 1 1 1 1 1 1
2 1 1 0 1 1 1 1
3 1 1 1 2 3 3 3"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
What i would like to achieve is :
Create 5 columns(r1-r5)
which is the division of each column x1-x5 with MeasureX (example x1/measurex, x2/measurex etc.)
Create 5 columns(p1-p5)
which is the division of each column x1-x5 with number 1-5 (the number of xcolumns) example x1/1, x2/2 etc.
MeasureY is irrelevant for now, the end product would be the ID and columns r1-r5 and p1-p5, is this feasible?
In SAS i would go with something like this:
data test6;
set test5;
array x {5} x1- x5;
array r{5} r1 - r5;
array p{5} p1 - p5;
do i=1 to 5;
r{i} = x{i}/MeasureX;
p{i} = x{i}/(i);
end;
The reason would be to have more dynamic beacuse the number of columns could change in the future.
Argument recycling allows you do do element-wise division with a constant vector. The tricky part was extracting the digits from the column names. I then repeated each of the digits by the number of rows to do the second division-task.
DF[ ,paste0("r", 1:5)] <- DF[ , grep("x", names(DF) )]/ DF$MeasureX
DF[ ,paste0("p", 1:5)] <- DF[ , grep("x", names(DF) )]/ # element-wise division
rep( as.numeric( sub("\\D","",names(DF)[ # remove non-digits
grep("x", names(DF))] #returns only 'x'-cols
) ), each=nrow(DF) ) # make them as long as needed
#-------------
> DF
ID MeasureX MeasureY x1 x2 x3 x4 x5 r1 r2 r3 r4 r5 p1 p2 p3 p4 p5
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.5 0.3333333 0.25 0.2
2 2 1 1 0 1 1 1 1 0 1 1 1 1 0 0.5 0.3333333 0.25 0.2
3 3 1 1 1 2 3 3 3 1 2 3 3 3 1 1.0 1.0000000 0.75 0.6
This could be greatly simplified if you already know the sequence vector for the second division task would be 1-5, but this was designed to allow "gaps" in the sequence for column names and still use the digit information in the names as the divisor. (You were not entirely clear about what situations this code would be used in.) The construct of r{1-5} in SAS is mimicked by [ , paste0('r', 1:5)]. SAS is a macro language and sometimes experienced users have trouble figuring out how to make R behave like one. Generally it takes a while to lose the for-loop mentality and begin using R as a functional language.
An alternative with the data.table package:
cols <- names(df[c(4:8)])
library(data.table)
setDT(df)[, (paste0("r",1:5)) := .SD / df$MeasureX, by = ID, .SDcols = cols
][, (paste0("p",1:5)) := .SD / 1:5, by = ID, .SDcols = cols]
which results in:
> df
ID MeasureX MeasureY x1 x2 x3 x4 x5 r1 r2 r3 r4 r5 p1 p2 p3 p4 p5
1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.5 0.3333333 0.25 0.2
2: 2 1 1 0 1 1 1 1 0 1 1 1 1 0 0.5 0.3333333 0.25 0.2
3: 3 1 1 1 2 3 3 3 1 2 3 3 3 1 1.0 1.0000000 0.75 0.6
You could put together a nifty loop or apply to do this, but here it is explicitly:
# Handling the "r" columns.
DF$r1 <- DF$x1 / DF$MeasureX
DF$r2 <- DF$x2 / DF$MeasureX
DF$r3 <- DF$x3 / DF$MeasureX
DF$r4 <- DF$x4 / DF$MeasureX
DF$r5 <- DF$x5 / DF$MeasureX
# Handling the "p" columns.
DF$p1 <- DF$x1 / 1
DF$p2 <- DF$x2 / 2
DF$p3 <- DF$x3 / 3
DF$p4 <- DF$x4 / 4
DF$p5 <- DF$x5 / 5
# Taking only the columns we want.
FinalDF <- DF[, c("ID", "r1", "r2", "r3", "r4", "r5", "p1", "p2", "p3", "p4", "p5")]
Just noting that this is pretty straightforward matrix manipulation that you definitely could have found elsewhere. Perhaps you're new to R, but still put a little more effort in next time. If you are new to R, it's definitely worth the time to look up some basic R coding tutorial or video.

Natural join subset decomposition

For this question , I found the answer is (c). but I can give an example to show that (c) is not correct. which is the answer?
Let r be a relation instance with schema R = (A, B, C, D). We define r1 = ‘select A,B,C from r’ and r2 = ‘select A, D from r’. Let s = r1 * r2 where * denotes natural join. Given that the decomposition of r into r1 and r2 is lossy, which one of the following is true?
(a) s is subset of r
(b) r U s = r
(c) r is a subset of s
(d) r * s = s
If the Answer is (c) , consider the following example with lossy decomposition of r into r1 and r2.
Table r
A B C D
1 10 100 1000
2 20 200 1000
3 20 200 1001
Table r1
A B C
1 10 100
2 20 200
Table r2
A D
2 1000
3 1001
Table s (natural join of r1 and r2)
A B C D
2 20 200 1000
The answer is not (c) . but I can also give you an example that (c) can be an answer.
What should be the answer?
Table r
A B C D
1 10 100 1000
1 20 200 2000
Table r1
A B C
1 10 100
1 20 200
Table r2
A D
1 1000
1 2000
Table s (natural join of r1 and r2)
A B C D
1 10 100 1000
1 20 200 1000
1 10 100 2000
1 20 200 2000
The decomposition is called "lossy" because we have lost the ability to recompose the original relation value using natural join. This lost ability manifests itself by extra rows appearing when we try the natural join. The underlying cause why this happens is that the decomposition did not retain any key ( { {B} {C} {D} } ) fully in both tables of the decomposition. If any key of the original is fully retained in all components of a decomposition, then the decomposition isn't lossy.
Table r is:
A B C D
1 10 100 1000
2 20 200 1000
3 20 200 1001
r1 is:
r1 = ‘select A,B,C from r’
A B C
1 10 100
2 20 200
3 20 200
r2 is:
r2 = ‘select A, D from r'
A D
1 1000
2 1000
3 1001
s is:
s = r1 * r2
1 10 100 1000
2 20 200 1000
3 20 200 1001
So effectively r is as subset of s. If the definicion of r1 is ‘select A,B,C from r’ you just can't remove a row (as you did in your example) from the result and said that r1 still complies with the definition, the same applies to r2 where you remove the first row.

Adding and multiplying tables' data by values in another table

Say I have a table of subtractions and divisions sorted by date:
tblFactors
dt sub divide
2014-07-01 1 1
2014-06-01 0 5
2014-05-01 2 1
2014-05-01 0 3
I have another table of values, sorted by date:
tblValues
dt val
2014-07-05 4
2014-06-15 5
2014-05-15 21
2014-04-14 31
2014-03-15 71
I need to perform some sequential calculations. For the first value in tblFactors, I need to subtract 1 from every val where tblValues.dt < '2014-07-01'.
Next, I need to process the second row in tblFactors. There is nothing to subtract. However, the divide = 5 means that I need to divide every val by 5 where tblValues.dt < '2014-06-01'. The tricky thing is that I need to do this on the modified val from the row before (divide 20 / 5, not 21 / 5).
Each row in tblFactors would process in this manner, giving a sequence like this:
tblFactors: Row 1 Row 2 Row 3 Row 4
Dt Original Val Subtract 1 Divide by 5 Subtract 2 Divide by 3
7/5/2014 4
6/15/2014 5 4
5/15/2014 21 20 4
4/14/2014 31 30 6 4
3/25/2014 71 70 14 12 4
This would leave me with:
qryValues
dt val
2014-07-05 4
2014-06-15 4
2014-05-15 4
2014-04-14 4
2014-03-15 4
Right now I'm doing vector multiplications over loops in R. I was wondering if there was a clever way to accomplish this in the native sql. I tried doing some aggregations but I've had limited success.

Resources