Total field in Crosstab query in SQL Server 2008 - sql-server

I have got this result with the answer given by bluefeet
Equipt BSL AQ
TFP 3 2
TM 1 0
VCB 18 6
VCD 5 8
Query script was
SELECT Equipt, [BSL] AS BSL, [AQ] AS AQ
FROM
(
SELECT Equipt, Shed
FROM PunctualityMain WHERE Date >= '4/1/2012' AND Date <= '4/30/2012' AND classification = 'Loco'
) x
PIVOT
(
COUNT(Shed)
FOR Shed IN ([BSL], [AQ])
) p
Can it be possible to added one total field the above script like access crosstab
Equipt BSL AQ TTL
TFP 3 2 5
TM 1 0 1
VCB 18 6 24
VCD 5 8 13

You should be able to add the Totals field by including a new column:
SELECT Equipt, [BSL] AS BSL, [AQ] AS AQ, ([BSL] + [AQ]) as TTL
FROM
(
SELECT Equipt, Shed
FROM PunctualityMain
WHERE Date >= '4/1/2012' AND Date <= '4/30/2012'
AND classification = 'Loco'
) x
PIVOT
(
COUNT(Shed)
FOR Shed IN ([BSL], [AQ])
) p

Related

sum values across any 365 day period

I've got a dataset that has id, start date and a claim value (in dollars) in each row - most ids have more than one row - some span over 50 rows. The earliest date for each ID/claim varies, and the claim values are mostly different.
I'd like to do a rolling sum of the value of IDs that have claims within 365 days of each other, to report each ID that has claims that have exceeded a limiting value across each period. So for an ID that had a claim date on 1 January, I'd sum all claims to 31 December (inclusive). Most IDs have several years of data so for the example above, I'd also need to check that if they had a claim on 1 May that they hadn't exceeded the limit by 30 April the following year and so on. I normally see this referred to as a 'rolling sum'. My site has many SAS products including base, stat, ets, and others.
I'm currently testing code on a small mock dataet and so far I've converted a thin file to a fat file with one column for each claim value and each date of the claim. The mock dataset is similar to the client dataset that I'll be using. Here's what I've done so far (noting that the mock data uses days rather than dates - I'm not at the stage where I want to test on real data yet).
data original_data;
input ppt $1. day claim;
datalines;
a 1 7
a 2 12
a 4 12
a 6 18
a 7 11
a 8 10
a 9 14
a 10 17
b 1 27
b 2 12
b 3 14
b 4 12
b 6 18
b 7 11
b 8 10
b 9 14
b 10 17
c 4 2
c 6 4
c 8 8
;
run;
proc sql;
create table ppt_counts as
select ppt, count(*) as ppts
from work.original_data
group by ppt;
select cats('value_', max(ppts) ) into :cats
from work.ppt_counts;
select cats('dates_',max(ppts)) into :cnts
from work.ppt_counts;
quit;
%put &cats;
%put &cnts;
data flipped;
set original_data;
by ppt;
array vars(*) value_1 -&cats.;
array dates(*) dates_1 - &cnts.;
array m_vars value_1 - &cats.;
array m_dates dates_1 - &cnts.;
if first.ppt then do;
i=1;
do over m_vars;
m_vars="";
end;
do over m_dates;
m_dates="";
end;
end;
if first.ppt then do:
i=1;
vars(i) = claim;
dates(i)=day;
if last.ppt then output;
i+1;
retain value_1 - &cats dates_1 - &cnts. 0.;
run;
data output;
set work.flipped;
max_date =max(of dates_1 - &cnts.);
max_value =max(of value_1 - &cats.);
run;
This doesn't give me even close to what I need - not sure how to structure code to make this correct.
What I need to end up with is one row per time that an ID exceeds the yearly limit of claim value (say in the mock data if a claim exceeds 75 across a seven day period), and to include the sum of the claims. So it's likely that there may be multiple lines per ID and the claims from one row may also be included in the claims for the same ID on another row.
type of output:
ID sum of claims
a $85
a $90
b $80
On separate rows.
Any help appreciated.
Thanks
If you need to perform a rolling sum, you can do this with proc expand. The code below will perform a rolling sum of 5 days for each group. First, expand your data to fill in any missing gaps:
proc expand data = original_data
out = original_data_expanded
from = day;
by ppt;
id day;
convert claim / method=none;
run;
Any days with gaps will have missing value of claim. Now we can calculate a moving sum and ignore those missing days when performing the moving sum:
proc expand data = original_data
out = want(where=(NOT missing(claim)));
by ppt;
id day;
convert claim = rolling_sum / transform=(movsum 5) method=none;
run;
Output:
ppt day rolling_sum claim
a 1 7 7
a 2 19 12
a 4 31 12
a 6 42 18
a 7 41 11
...
b 9 53 14
b 10 70 17
c 4 2 2
c 6 6 4
c 8 14 8
The reason we use two proc expand statements is because the rolling sum is calculated before the days are expanded. We need the rolling sum to occur after the expansion. You can test this by running the above code all in a single statement:
/* Performs moving sum, then expands */
proc expand data = original_data
out = test
from = day;
by ppt;
id day;
convert claim = rolling_sum / transform=(movsum 5) method=none;
run;
Use a SQL self join with the dates being within 365 days of itself. This is time/resource intensive if you have a very large data set.
Assuming you have a date variable, the intnx is probably the better way to calculate the date interval than 365 depending on how you want to account for leap years.
If you have a claim id to group on, that would also be better than using the group by clause in this example.
data have;
input ppt $1. day claim;
datalines;
a 1 7
a 2 12
a 4 12
a 6 18
a 7 11
a 8 10
a 9 14
a 10 17
b 1 27
b 2 12
b 3 14
b 4 12
b 6 18
b 7 11
b 8 10
b 9 14
b 10 17
c 4 2
c 6 4
c 8 8
;
run;
proc sql;
create table want as
select a.*, sum(b.claim) as total_claim
from have as a
left join have as b
on a.ppt=b.ppt and
b.day between a.day and a.day+365
group by 1, 2, 3;
/*b.day between a.day and intnx('year', a.day, 1, 's')*/;
quit;
Assuming that you have only one claim per day you could just use a circular array to keep track of the pervious N days of claims to generate the rolling sum. By circular array I mean one where the indexes wrap around back to the beginning when you increment past the end. You can use the MOD() function to convert any integer into an index into the array.
Then to get the running sum just add all of the elements in the array.
Add an extra DO loop to zero out the days skipped when there are days with no claims.
%let N=5;
data want;
set original_data;
by ppt ;
array claims[0:%eval(&n-1)] _temporary_;
lagday=lag(day);
if first.ppt then call missing(of lagday claims[*]);
do index=max(sum(lagday,1),day-&n+1) to day-1;
claims[mod(index,&n)]=0;
end;
claims[mod(day,&n)]=claim;
running_sum=sum(of claims[*]);
drop index lagday ;
run;
Results:
running_
OBS ppt day claim sum
1 a 1 7 7
2 a 2 12 19
3 a 4 12 31
4 a 6 18 42
5 a 7 11 41
6 a 8 10 51
7 a 9 14 53
8 a 10 17 70
9 b 1 27 27
10 b 2 12 39
11 b 3 14 53
12 b 4 12 65
13 b 6 18 56
14 b 7 11 55
15 b 8 10 51
16 b 9 14 53
17 b 10 17 70
18 c 4 2 2
19 c 6 4 6
20 c 8 8 14
Working in a known domain of date integers, you can use a single large array to store the claims at each date and slice out the 365 days to be summed. The bookkeeping needed for the modular approach is not needed.
Example:
data have;
call streaminit(20230202);
do id = 1 to 10;
do date = '01jan2012'd to '02feb2023'd;
date + rand('integer', 25);
claim = rand('integer', 5, 100);
output;
end;
end;
format date yymmdd10.;
run;
options fullstimer;
data want;
set have;
by id;
array claims(100000) _temporary_;
array slice (365) _temporary_;
if first.id then call missing(of claims(*));
claims(date) = claim;
call pokelong(
peekclong(
addrlong (claims(date-365))
, 8*365)
,
addrlong(slice(1))
);
rolling_sum_365 = sum(of slice(*));
if dif1(claim) < 365 then
claims_out_365 = lag(claim) - dif1(rolling_sum_365);
if first.id then claims_out_365 = .;
run;
Note: SAS Date 100,000 is 16OCT2233

Transpose rows 1 and 0 's into different rows in Snowflake

I'm trying to load a file and transpose the row into different rows.
Days Column have 11010011 and need to transpose into vertical format.
Below is the sample input
I'm trying to get the expected output like below
Can you please help me on this in Snowflake? Appreciate your help
Replace '1' with '1,' and '0' with '0,'. Trim the trailing comma. You can then use split to table to turn that into rows:
with SOURCE_DATA as
(
select COLUMN1::int as FACTORY
,COLUMN2::int as YEAR
,COLUMN3::string as DAYS
from (values
(01,2021,'01010100100101010001'),
(99,2021,'00100111010101011010')
)
)
select FACTORY, YEAR, SEQ as SOURCE_ROW, INDEX as POSITION_IN_STRING, VALUE as WORKING_DAY
from SOURCE_DATA, table(split_to_table(trim(replace(replace(DAYS,'1','1,'),'0','0,'),','),',')) D
;
Abbreviated output:
FACTORY
YEAR
SOURCE_ROW
POSITION_IN_STRING
WORKING_DAY
1
2021
1
1
0
1
2021
1
2
1
1
2021
1
3
0
1
2021
1
4
1
1
2021
1
5
0
The split() table function gives you some metadata columns with information on the split. You can change the sample to select * to see them and maybe they're useful in some way for your requirements.

SQL Server query problem. example is in excel sheet picture

Please see the following pic and i want to convert this formula in SQL Server.
in excel sheet
M N
15 1 0
16 3 1
17 5 2
18 8 4
19 9 4
N= IF(M16-M15<=1,N15,M16-M15-1+N15
Please see the screenshot for reference:
As per your tags, this can be done with LAG and then doing a running total.
For each row, first calculate the difference in M from the previous row (using LAG) - I call this Dif_Last_M. This mirrors the 'M24-M23' part of your formula.
If Dif_Last_M is <= 1, add 0 to the running total (effectively making the running total the same as for the previous row)
Else if Dif_Last_M is > 1, add (Dif_Last_M minus 1) to the running total
Here is the code assuming your source table is called #Temp and has an ID (sorting value)
WITH M_info AS
(SELECT ID, M, (M - LAG(M, 1) OVER (ORDER BY ID)) AS Dif_Last_M
FROM #Temp
)
SELECT ID,
M,
SUM(CASE WHEN Dif_Last_M > 1 THEN Dif_Last_M - 1 ELSE 0 END) OVER (ORDER BY ID) AS N
FROM M_info;
And here are the results
ID M N
1 1 0
2 3 1
3 5 2
4 8 4
5 9 4
6 12 6
7 13 6
Here is a db<>fiddle with the above. It also includes additional queries showing
The result from the CTE
The values used in the running total
Note that while it possible to do this with recursive CTEs, they tend to have performance problems (they are loops, fundamentally). Soit is better (performance-wise) to avoid recursive CTEs if possible.

T-SQL to sum total value instead of rejoining table multiple times

I've looked for an example question like this, I ask for grace if it's been answered (I thought it would have been but have a hard time finding meaningful results with the terms I searched.)
I work at a manufacturing plant where at ever manufacturing operation a part is issued a new serial number. The database table I have to work with has the serial number recorded in the Container field and the previous serial number the part had recorded in the From_Container field.
I'm trying to SUM the Extended_Cost column on parts we've had to re-do operations on.
Here's a sample of data from tbl_Container:
Container From_Container Extended_Cost Part_Key Operation
10 9 10 PN_100 60
9 8 10 PN_100 50
8 7 10 PN_100 40
7 6 10 PN_100 30
6 5 10 PN_100 20
5 4 10 PN_100 50
4 3 10 PN_100 40
3 2 10 PN_100 30
2 1 10 PN_100 20
1 100 10 PN_100 10
In this example the SUM I would expect returned is 40, because operations 20, 30, 40 and 50 were all re-done and cost $10 each.
So far I've been able to do this by rejoining the table to itself 10 times using aliases in the following fashion:
LEFT OUTER JOIN tbl_Container AS FCP_1
ON tbl_Container.From_Container = FCP_1.Container
AND FCP_1.Operation <= tbl_Container.Operation
AND tbl_Container.Part_Key = FCP_1.Part_Key
And then using SUM to add the Extended_Cost fields together. However, I'm violating the DRY principle and there has got to be a better way.
Thank you in advance for your help,
Me
You can try this query.
;WITH CTE AS
(
SELECT TOP 1 *, I = 0 FROM tbl_Container C ORDER BY Container
UNION ALL
SELECT T.*, I = I + 1 FROM CTE
INNER JOIN tbl_Container T
ON CTE.Container = T.From_Container
AND CTE.Part_Key = T.Part_Key
)
SELECT Part_Key, SUM(T1.Extended_Cost) Sum_Extended_Cost FROM CTE T1
WHERE
EXISTS( SELECT * FROM
CTE T2 WHERE
T1.Operation = T2.Operation
AND T1.I > T2.I )
GROUP BY Part_Key
Result:
Part_Key Sum_Extended_Cost
---------- -----------------
PN_100 40

MS SQL Table of CRON Future Runs

I have a need to create a table/view containing a full and complete list of future CRON executions for a period of time, e.g. from 12 months ago to 12 months in the future.
My source data is in MS SQL 2012 and contains the following sample information;
TASK SCHEDULE SCHEDULESTART SCHEDULEEND
T1 0 0 0 ? * MON 2015-04-08 16:15:09.557 2015-04-20 00:00:00.000
T2 0 0 0 ? * MON 2015-05-22 15:56:48.140 2015-07-27 00:00:00.000
T3 0 0/56 * * * ? 2015-06-25 10:17:07.387 2015-06-25 15:00:00.000
T4 0 10/15 21 3,19 5-9 ? 2015-06-25 10:18:48.077 2015-08-28 10:17:15.000
Unfortunately as MS SQL doesn't support/contain a JVM, I'm limited (I think) to programmatically breaking this out into it's components parts.
I've managed to break out he parts of the expression with the following;
;WITH cte (SCHEDULE,SCHEDULESTART,SCHEDULEEND,SCHED_Attributes)
AS
(
SELECT SCHEDULE,SCHEDULESTART,SCHEDULEEND,
CONVERT(XML,'<Product><Attribute>'
+ REPLACE([SCHEDULE],' ', '</Attribute><Attribute>')
+ '</Attribute></Product>') AS SCHED_Attributes
FROM USCH_TASK
)
SELECT
SCHEDULE,SCHEDULESTART,SCHEDULEEND,
SCHED_Attributes.value('/Product[1]/Attribute[1]','varchar(25)') AS sched_seconds,
SCHED_Attributes.value('/Product[1]/Attribute[2]','varchar(25)') AS sched_minutes,
SCHED_Attributes.value('/Product[1]/Attribute[3]','varchar(25)') AS sched_hours,
SCHED_Attributes.value('/Product[1]/Attribute[4]','varchar(25)') AS sched_day_of_month,
SCHED_Attributes.value('/Product[1]/Attribute[5]','varchar(25)') AS sched_month,
SCHED_Attributes.value('/Product[1]/Attribute[6]','varchar(25)') AS sched_day_of_week,
SCHED_Attributes.value('/Product[1]/Attribute[7]','varchar(25)') AS sched_year
from cte
This results in (for example)
sched_seconds sched_minutes sched_hours sched_day_of_month sched_month sched_day_of_week sched_year
0 0 0 ? * MON NULL
0 0 0 ? * MON NULL
0 0/56 * * * ? NULL
0 10/15 21 3,19 5-9 ? NULL
Main thrust of this question is then how to handle the component parts of this, * and ? are easy enough, ranges (e.g. 5-9 or MON-THU) are pretty OK, but am struggling with how to determine where have specific dates/months (e.g. 3,19) or more complex configurations (such as the last example above or days of month ="1-3,6-7,15")
CASE
WHEN CHARINDEX('*',Prod_Attributes.value('/Product[1]/Attribute[6]','varchar(25)')) > 0 THEN 'Y'
WHEN CHARINDEX('?',Prod_Attributes.value('/Product[1]/Attribute[6]','varchar(25)')) > 0 THEN 'Y'
WHEN CHARINDEX('MON',Prod_Attributes.value('/Product[1]/Attribute[6]','varchar(25)')) > 0 THEN 'Y'
WHEN CHARINDEX('SUN-TUE',Prod_Attributes.value('/Product[1]/Attribute[6]','varchar(25)')) > 0 THEN 'Y'
WHEN CHARINDEX('SUN-WED',Prod_Attributes.value('/Product[1]/Attribute[6]','varchar(25)')) > 0 THEN 'Y'
WHEN CHARINDEX('SUN-THU',Prod_Attributes.value('/Product[1]/Attribute[6]','varchar(25)')) > 0 THEN 'Y'
WHEN CHARINDEX('SUN-FRI',Prod_Attributes.value('/Product[1]/Attribute[6]','varchar(25)')) > 0 THEN 'Y'
WHEN CHARINDEX('SUN-SAT',Prod_Attributes.value('/Product[1]/Attribute[6]','varchar(25)')) > 0 THEN 'Y'
ELSE 'N'
END as DOWMon
However this approach wouldn't work for day 1 of the month as the code
WHEN CHARINDEX('1',Prod_Attributes.value('/Product[1]/Attribute[4]','varchar(25)')) > 0 THEN 'Y'
would also find value 10 through 19, 21 and 31!
Any tips or tricks are gratefully received!
Andy

Resources