CASE when statement SQL Server - sql-server

I have a stored procedure in the works, however, I'm missing a few pieces of data when I run it. I know the reason why, but I'm having a hard time figure it out... this is my code:
INSERT INTO tempIntake (Pop, PlanID, PopFull, ApptDate, [1stAppt], Followup, Rn, UserID)
SELECT Pop, PlanID, PopFull, InterviewDate, 1 stAppt, Followup, rn, #UserID
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY PlanID ORDER BY AddedDate ASC) AS rn
FROM VInfo
WHERE AddedDate IS NOT NULL
) t
WHERE rn = 1 AND interviewdate >= #fromDate AND interviewDate <= #toDate
So what I'm trying to do here is this....For some PLANID's I don't have an added date. Because I'm filtering by the AddedDate and in case is null - this data does not come up - even though I need it to be shown. In this case I'd like to make it a dummy date like '1/1/2016' this way, when the actually AddedDate becomes available in the table, it will be used instead of the dummy date.

If AddedDate can't be "bigger" then GETDATE(), you can use ISNULL(AddedDate,GETDATE())) and remove the where AddedDate is not null condition:
Insert into tempIntake(Pop, PlanID, PopFull, ApptDate, 1stAppt, Followup, Rn, UserID)
select
Pop, PlanID, PopFull, InterviewDate, 1stAppt, Followup, rn, #UserID
from
(Select
*,
row_number() over (partition by PlanID order BY ISNULL(AddedDate,GETDATE())) as rn
from
VInfo) t
where
rn = 1
and interviewdate >= #fromDate
and interviewDate <= #toDate

Related

How can I use COUNT() result in RANK

I have a SQL query for 'SQL Server' which 'count()' based on column 'id'.
And I also want to 'RANK' based on the value of COUNT(). But when I try the below query,
I get error sayign 'Invalid column names 'IdCount''
SELECT
[id],
COUNT(*) AS IdCount,
RANK() OVER (
ORDER BY IdCount
) CountRank
FROM myTable
where DATEDIFF(day,[Time],GETDATE()) < 30
GROUP BY [id]
Can you please tell me how can I reference the COUNT() result?
Thank you.
You can't reference an expression's alias in a window function at the same scope. Also, I think you want the RANK() to be applied from the highest count, not the lowest. Finally, you should never apply calculations like DATEDIFF against a column. How about:
DECLARE #start date = DATEADD(DAY, -30, GETDATE());
SELECT id, IdCount,
CountRank = RANK() OVER (ORDER BY IdCount DESC)
FROM
(
SELECT id, COUNT(*)
FROM dbo.myTable
WHERE [Time] >= #start
GROUP BY id
) AS x;

Union query removing blank rows

I'm trying to union two different results but I'm not sure how to remove the blank values. The paid_date and check_date are sometimes different. What do I need to do to modify the code? I've tried doing MAX but didn't seem to work. Thanks.
Results:
What I want:
WITH PAID AS
(
SELECT PIDATE
,SUM(PAY) AS PAID
,ROW_NUMBER() OVER(ORDER BY PAYDATE DESC) AS ROW_NUM
FROM CLAIM
WHERE PAY<>0
GROUP BY PAYDATE
)
,TOTAL_PAID AS
(
SELECT CONVERT(DATE,CONVERT(VARCHAR(10),PAYDATE)) AS PAID_DATE
,FORMAT(PAID,'C','EN-US') AS PAID
FROM PAID
WHERE ROW_NUM <= 10
)
,CHECKS AS
(
SELECT DISTINCT CHECK_DATE
,SUM(PAYMENT) AS PAYMENT
,ROW_NUMBER() OVER(ORDER BY CHECK_DATE DESC) AS ROW_NUM
FROM CR
GROUP BY CHECK_DATE
)
,TOTAL_CHECKS AS
(
SELECT CONVERT(DATE,CONVERT(VARCHAR(10),CHECK_DATE)) AS CHECK_DATE
,FORMAT(PAYMENT,'C','EN-US') AS PAYMENT
FROM CHECKS
WHERE ROW_NUM <= 10
)
,FINAL AS
(
SELECT PAID_DATE
,PAID
,'' AS CHECK_DATE
,'' AS PAYMENT
FROM TOTAL_PAID
UNION ALL
SELECT '' AS PAID_DATE
,'' AS PAID
,CHECK_DATE
,PAYMENT
FROM TOTAL_CHECKS
)
SELECT *
FROM FINAL
In this scenario, you must not use union. you must use join instead.
Try join two sequence on paid_date = check_date and select proper field from each set.

SQL statement to get data without duplicates for a combination of columns

There is an event capture table that we have which contains different type of events (based on EventTypeId) for multiple assets (based on assetId).
There was an application bug in our code which we fixed recently where the end time was not being captured correctly. And endtime is captured when the "Severity" changes for a given event type and asset. But this was being done incorrectly.
I tried the below query to get the start and end time but due to the repeating duplicates, I am unable to get to the correct data.
The SQL which I currently formulated (took cue from : Calculate time between On and Off Status rows SQL Server)
WITH ReportData
AS (SELECT e.Id [EventId]
,e.AssetId
,e.StartTime
,e.Severity
,e.EventTypeId
,a.Name [AssetName]
,ROW_NUMBER() OVER (PARTITION BY e.AssetId ORDER BY e.StartTime) RowNum
,ROW_NUMBER() OVER (PARTITION BY e.AssetId ORDER BY e.StartTime)
- ROW_NUMBER() OVER (PARTITION BY e.AssetId, e.Severity ORDER BY e.StartTime) AS [Group]
FROM dbo.Event e
JOIN dbo.Asset a
ON a.Id = e.AssetId)
SELECT state1.AssetName
,state1.AssetId
,MIN(state1.StartTime) [START]
,MAX(state2.StartTime) [END]
,DATEDIFF(SS, MIN(state1.StartTime), MAX(state2.StartTime)) [Duration]
,state1.Severity
,state1.EventId
FROM ReportData state1
LEFT JOIN ReportData state2
ON state1.RowNum = state2.RowNum - 1
WHERE state1.Severity = 'Extreme'
AND state2.StartTime IS NOT NULL
AND state1.EventTypeId = 27
GROUP BY state1.AssetName
,state1.AssetId
,state1.Severity
,state1.EventId
,state1.[Group]
ORDER BY MIN(state1.StartTime) DESC;
The duplicates look something like this
Can someone give me the way to calculate the start and end times based on status change (event type and asset change for severity), ignoring the duplicates.
Also if you could give me a query to identify the duplicates so that we can delete it, would be awesome!
You can define a CTE to first remove the duplicates, and then run your Query using that CTE :
with e as (
select min(Id) as Id, -- We return the first ID for every duplicate
AssetId,
StartTime,
Severity,
EventTypeId
from dbo.Event
group by AssetId, StartTime, Severity, EventTypeId
),
--- Here comes your Query, using e instead of Event
So, it's going to be :
with e as (
select min(Id) as Id, -- We return the first ID for every duplicate
AssetId,
StartTime,
Severity,
EventTypeId
from dbo.Event
group by AssetId, StartTime, Severity, EventTypeId
),
ReportData as (
SELECT e.Id [EventId]
,e.AssetId
,e.StartTime
,e.Severity
,e.EventTypeId
,a.Name [AssetName]
,ROW_NUMBER() OVER (PARTITION BY e.AssetId ORDER BY e.StartTime) RowNum
,ROW_NUMBER() OVER (PARTITION BY e.AssetId ORDER BY e.StartTime)
- ROW_NUMBER() OVER (PARTITION BY e.AssetId, e.Severity ORDER BY e.StartTime) AS [Group]
FROM e
JOIN dbo.Asset a
ON a.Id = e.AssetId
)
SELECT state1.AssetName
,state1.AssetId
,MIN(state1.StartTime) [START]
,MAX(state2.StartTime) [END]
,DATEDIFF(SS, MIN(state1.StartTime), MAX(state2.StartTime)) [Duration]
,state1.Severity
,state1.EventId
FROM ReportData state1
LEFT JOIN ReportData state2
ON state1.RowNum = state2.RowNum - 1
WHERE state1.Severity = 'Extreme'
AND state2.StartTime IS NOT NULL
AND state1.EventTypeId = 27
GROUP BY state1.AssetName
,state1.AssetId
,state1.Severity
,state1.EventId
,state1.[Group]
ORDER BY MIN(state1.StartTime) DESC;
So if you can have same severity for same asset and event multiple times, and not in a row (which actually makes the case difficult), then first we have to know which date field (StartDate or CreatedAt) we can define as a field to order your selection. In my query below I suppose it id CreatedAt and this is how I would prepare a dataset of rows to keep (we don`t just keep some rows and delete some, we also have to update EndTime or StartTime on the row we leave), please also notice the comments:
--First we order the selection by entry time and for each line we want to know which severity will be next and previous
WITH PrevAndNext AS
(
SELECT
Id,
AssetId,
EventTypeId,
Severity,
StartTime,
EndTime,
CreatedAt,
LAG(Severity) OVER (PARTITION BY AssetId, EventTypeId ORDER BY CreatedAt ASC) AS PrevSeverity, -- date field for ORDER BY depends on your logic!
LEAD(Severity) OVER (PARTITION BY AssetId, EventTypeId ORDER BY CreatedAt ASC) AS NextSeverity -- date field for ORDER BY depends on your logic!
FROM
Table
)
--From the selection above we define the first and last occurence of each severity event
,FirstAndLast AS
(
SELECT
Id,
AssetId,
EventTypeId,
Severity,
StartTime,
EndTime,
CreatedAt,
CASE
WHEN PrevSeverity IS NULL OR PrevSeverity <> Severity THEN 'FirstOccurence'
WHEN NextSeverity IS NULL OR NextSeverity <> Severity THEN 'LastOccurence'
ELSE 'MiddleOccurence'
END AS Occurence
FROM
PrevAndNext
)
--Then we suppose we want to keep only the first occurence for each severity event, but we need to pick the EndDate from the last occurence
,MergeStartAndEndTime AS
(
SELECT
Id,
AssetId,
EventTypeId,
Severity,
StartTime,
CASE
WHEN Occurence = 'FirstOccurence' AND LEAD(Occurence) OVER (PARTITION BY AssetId, EventTypeId ORDER BY CreatedAt ASC) = 'LastOccurence' THEN LEAD(EndTime) OVER (PARTITION BY AssetId, EventTypeId ORDER BY CreatedAt ASC) AS KeepIt -- date field for ORDER BY depends on your logic!
ELSE EndTime
END AS EndTime,
CreatedAt
FROM
FirstAndLast
WHERE
Occurence IN ('FirstOccurence', 'LastOccurence')
)
--Here is the dataset you want to keep. You can use it to update the EndDate field for Id-s, and then remove all the other Id-s which are not in the dataset. Please check it carefully and first try it on some test dataset with duplicates. Feel free to adjust it for your logic if necessary.
SELECT
Id,
AssetId,
EventTypeId,
Severity,
StartTime,
EndTime,
CreatedAt
FROM
MergeStartAndEndTime
WHERE
Occurence = 'FirstOccurence';

SQL Query returning multiple values

I am trying to write a query that returns the time taken by an Order from start to completion.
My table looks like below.
Order No. Action DateTime
111 Start 3/23/2018 8:18
111 Complete 3/23/2018 9:18
112 Start 3/24/2018 6:00
112 Complete 3/24/2018 11:10
Now I am trying to calculate the date difference between start and completion of multiple orders and below is my query:
Declare #StartDate VARCHAR(100), #EndDate VARCHAR(100), #Operation VARCHAR(100)
declare #ORDERTable table
(
order varchar(1000)
)
insert into #ORDERTable values ('111')
insert into #ORDERTable values ('112')
Select #Operation='Boiling'
set #EndDate = (SELECT DATE_TIME from PROCESS WHERE ACTION='COMPLETE' AND ORDER in (select order from #ORDERTable) AND OPERATION=#Operation)
---SELECT #EndDate
set #StartDate = (SELECT DATE_TIME from PROCESS WHERE ACTION='START' AND ORDER in (select order from #ORDERTable) AND OPERATION=#Operation)
---SELECT #StartDate
SELECT DATEDIFF(minute, #StartDate, #EndDate) AS Transaction_Time
So, I am able to input multiple orders but I want to get multiple output as well.
And my second question is if I am able to achieve multiple records as output, how am I gonna make sure which datediff is for which Order?
Awaiting for your answers. Thanks in advance.
I am using MSSQL.
You can aggregate by order number and use MAX or MIN with CASE WHEN to get start or end time:
select
order_no,
max(case when action = 'Start' then date_time end) as start_time,
max(case when action = 'Completed' then date_time end) as end_time,
datediff(
minute,
max(case when action = 'Start' then date_time end),
max(case when action = 'Completed' then date_time end)
) as transaction_time
from process
group by order_no
order by order_no;
You can split up your table into two temp tables, cte's, whatever, and then join them together to find the minutes it took to complete
DECLARE #table1 TABLE (OrderNO INT, Action VARCHAR(100), datetime datetime)
INSERT INTO #table1 (OrderNO, Action, datetime)
VALUES
(111 ,'Start' ,'3/23/2018 8:18'),
(111 ,'Complete' ,'3/23/2018 9:18'),
(112 ,'Start' ,'3/24/2018 6:00'),
(112 ,'Complete' ,'3/24/2018 11:10')
;with cte_start AS (
SELECT orderno, Action, datetime
FROM #table1
WHERE Action = 'Start')
, cte_complete AS (
SELECT orderno, Action, datetime
FROM #table1
WHERE Action = 'Complete')
SELECT
start.OrderNO, DATEDIFF(minute, start.datetime, complete.datetime) AS duration
FROM cte_start start
INNER JOIN cte_complete complete
ON start.OrderNO = complete.OrderNO
Why don't you attempt to approach this problem with a set-based solution? After all, that's what a RDBMS is for. With an assumption that you'd have orders that are of interest to you in a table variable like you described, #ORDERTable(Order), it would go something along the lines of:
SELECT DISTINCT
[Order No.]
, DATEDIFF(
minute,
FIRST_VALUE([DateTime]) OVER (PARTITION BY [Order No.] ORDER BY [DateTime] ASC),
FIRST_VALUE([DateTime]) OVER (PARTITION BY [Order No.] ORDER BY [DateTime] DESC)
) AS Transaction_Time
FROM tableName
WHERE [Order No.] IN (SELECT Order FROM #ORDERTable);
This query works if all the values in the Action attribute are either Start or Complete, but also if there are others in between them.
To read up more on the FIRST_VALUE() window function, check out the documentation.
NOTE: works in SQL Server 2012 or newer versions.

Lag() with condition in sql server

i have a table like this:
Number Price Type Date Time
------ ----- ---- ---------- ---------
23456 0,665 SV 2014/02/02 08:00:02
23457 1,3 EC 2014/02/02 07:50:45
23460 0,668 SV 2014/02/02 07:36:34
For each EC I need previous/next SV price. In this case, the query is simple.
Select Lag(price, 1, price) over (order by date desc, time desc),
Lead(price, 1, price) over (order by date desc, time desc)
from ITEMS
But, there are some special cases where two or more rows are EC type:
Number Price Type Date Time
------ ----- ---- ---------- ---------
23456 0,665 SV 2014/02/02 08:00:02
23457 1,3 EC 2014/02/02 07:50:45
23658 2,4 EC 2014/02/02 07:50:45
23660 2,4 EC 2014/02/02 07:50:48
23465 0,668 SV 2014/02/02 07:36:34
can I use Lead/Lag in this cases? If not, did I have to use a subquery?
Your question (and Anon's excellent answer) is part of the SQL of islands and gaps. In this answer, I will try to examine the "row_number() magic" in detail.
I've made a simple example based on events in a ballgame. For each event, we'd like to print the previous and next quarter related message:
create table TestTable (id int identity, event varchar(64));
insert TestTable values
('Start of Q1'),
('Free kick'),
('Goal'),
('End of Q1'),
('Start of Q2'),
('Penalty'),
('Miss'),
('Yellow card'),
('End of Q2');
Here's a query showing off the "row_number() magic" approach:
; with grouped as
(
select *
, row_number() over (order by id) as rn1
, row_number() over (
partition by case when event like '%of Q[1-4]' then 1 end
order by id) as rn2
from TestTable
)
, order_in_group as
(
select *
, rn1-rn2 as group_nr
, row_number() over (partition by rn1-rn2 order by id) as rank_asc
, row_number() over (partition by rn1-rn2 order by id desc)
as rank_desc
from grouped
)
select *
, lag(event, rank_asc) over (order by id) as last_event_of_prev_group
, lead(event, rank_desc) over (order by id) as first_event_of_next_group
from order_in_group
order by
id
The first CTE called "grouped" calculates two row_number()s. The first is 1 2 3 for each row in the table. The second row_number() places pause announcements in one list, and other events in a second list. The difference between the two, rn1 - rn2, is unique for each section of the game. It's helpful to check difference in the example output: it's in the group_nr column. You'll see that each value corresponds to one section of the game.
The second CTE called "order_in_group" determines the position of the current row within its island or gap. For an island with 3 rows, the positions are 1 2 3 for the ascending order, and 3 2 1 for the descending order.
Finally, we know enough to tell lag() and lead() how far to jump. We have to lag rank_asc rows to find the final row of the previous section. To find the first row of the next section, we have to lead rank_desc rows.
Hope this helps clarifying the "magic" of Gaps and Islands. Here is a working example at SQL Fiddle.
Yes, you can use LEAD/LAG. You just need to precalculate how far to jump with a little ROW_NUMBER() magic.
DECLARE #a TABLE ( number int, price money, type varchar(2),
date date, time time)
INSERT #a VALUES
(23456,0.665,'SV','2014/02/02','08:00:02'),
(23457,1.3 ,'EC','2014/02/02','07:50:45'),
(23658,2.4 ,'EC','2014/02/02','07:50:45'),
(23660,2.4 ,'EC','2014/02/02','07:50:48'),
(23465,0.668,'SV','2014/02/02','07:36:34');
; WITH a AS (
SELECT *,
ROW_NUMBER() OVER(ORDER BY [date] DESC, [time] DESC) x,
ROW_NUMBER() OVER(PARTITION BY
CASE [type] WHEN 'SV' THEN 1 ELSE 0 END
ORDER BY [date] DESC, [time] DESC) y
FROM #a)
, b AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY x-y ORDER BY x ASC) z1,
ROW_NUMBER() OVER(PARTITION BY x-y ORDER BY x DESC) z2
FROM a)
SELECT *,
CASE [type] WHEN 'SV' THEN
LAG(price,z1,price) OVER(PARTITION BY [type] ORDER BY x)
ELSE LAG(price,z1,price) OVER(ORDER BY x)
END,
CASE [type] WHEN 'SV' THEN
LEAD(price,z2,price) OVER(PARTITION BY [type] ORDER BY x)
ELSE LEAD(price,z2,price) OVER(ORDER BY x)
END
FROM b
ORDER BY x
Here is yet another way of achieving the same result, but using conditional max/min functions windowed over an ordinal. The ordinal can be be set up based on whatever columns fits the purpose, but in this case I believe the OP intends them to be Date and Time.
DROP TABLE IF EXISTS #t;
CREATE TABLE #t (
Number INT,
Price MONEY,
Type CHAR(2),
Date DATE,
Time TIME(0)
);
INSERT INTO #t VALUES
(23456, 0.666, 'SV', '2014/02/02', '10:00:02'),
(23457, 1.4 , 'EC', '2014/02/02', '09:50:45'),
(23658, 2.5 , 'EC', '2014/02/02', '09:50:45'),
(23660, 2.5 , 'EC', '2014/02/02', '09:50:48'),
(23465, 0.669, 'SV', '2014/02/02', '09:36:34'),
(23456, 0.665, 'SV', '2014/02/02', '08:00:02'),
(23457, 1.3 , 'EC', '2014/02/02', '07:50:45'),
(23658, 2.4 , 'EC', '2014/02/02', '07:50:45'),
(23660, 2.4 , 'EC', '2014/02/02', '07:50:48'),
(23465, 0.668, 'SV', '2014/02/02', '07:36:34'), -- which one of these?
(23465, 0.670, 'SV', '2014/02/02', '07:36:34'); --
WITH time_ordered AS (
SELECT *, DENSE_RANK() OVER (ORDER BY Date, Time) AS ordinal FROM #t
)
SELECT
*,
CASE WHEN Type = 'EC'
THEN MAX(CASE WHEN ordinal = preceding_non_EC_ordinal THEN Price END)
OVER (PARTITION BY preceding_non_EC_ordinal ORDER BY ordinal ASC) END AS preceding_price,
CASE WHEN Type = 'EC'
THEN MIN(CASE WHEN ordinal = following_non_EC_ordinal THEN Price END)
OVER (PARTITION BY following_non_EC_ordinal ORDER BY ordinal DESC) END AS following_price
FROM (
SELECT
*,
MAX(CASE WHEN Type <> 'EC' THEN ordinal END)
OVER (ORDER BY ordinal ASC) AS preceding_non_EC_ordinal,
MIN(CASE WHEN Type <> 'EC' THEN ordinal END)
OVER (ORDER BY ordinal DESC) AS following_non_EC_ordinal
FROM time_ordered
) t
ORDER BY Date, Time
Note that the example given by the OP has been extended to show that interspersed sequences of EC yeild the intended result. The ambiguity introduced by the earliest two consecutive rows with type SV will in this case lead to the maximum value being picked. Setting up the ordinal to include the Price is a possible way to change this behavior.
An SQLFiddle can be found here: http://sqlfiddle.com/#!18/85117/1
Anon's solution is wonderful and Andomar's explanation of it is also great, but there is a difficulty in using this approach in large data sets, namely that you can get conflicts in what Andomar called 'group_nr' (rn1 - rn2) where events from much earlier have the same group number. This skews the rownumber calculation (which is by group_nr) and presents incorrect results when these conflicts arise.
Posting because I ran into this myself after working through this solution and finding errors.
my fix was to implement this version:
; with grouped as
(
select *
, row_number() over (order by id) as rn1
, row_number() over (
partition by case when event like '%of Q[1-4]' then 1 end
order by id) as rn2
from TestTable
)
, order_in_group as
(
select *
, CASE
WHEN event like '%of Q[1-4]' THEN (-1*rn1-rn2)
ELSE rn1 - rn2
END as group_nr
, row_number() over (partition by rn1-rn2 order by id) as rank_asc
, row_number() over (partition by rn1-rn2 order by id desc)
as rank_desc
from grouped
)
, final_grouping AS
(SELECT *
, row_number() over (partition by group_nr order by jobid) AS rank_asc
, row_number() over (partition by rn1-rn2 order by id desc) AS rank_desc
FROM order_in_group
)
select *
, lag(event, rank_asc) over (order by id) as last_event_of_prev_group
, lead(event, rank_desc) over (order by id) as first_event_of_next_group
from final_grouping
order by
id
;
Changing the pause events' group_nr values to negatives makes sure there are no conflicts with large data sets.

Resources