SQL: NOT IN vs NOT EXISTS strange behavior

SQL: NOT IN vs NOT EXISTS strange behavior - sql-server

In my searching for answers, I seem to only be finding explanations that cover the existence of NULL which is why the NOT IN returns 0 results. However, my scenario is exactly the opposite. I'm getting my expected results with the NOT IN and my NOT EXISTS is giving me 0. And to clarify, I have no NULLs in my sub-query. Here is my query:
DECLARE #EndDate DATE= CAST(CONCAT(YEAR(GETDATE()), '-', MONTH(GETDATE()), '-01') AS DATE) --First day of this month
DECLARE #StartDate DATE= DATEADD(month, -12, #EndDate) --12 months prior
SELECT Deactivated = COUNT(DISTINCT o.ClinicLocationId)
FROM [order].package p WITH(NOLOCK)
INNER JOIN [order].[order] o WITH(NOLOCK) ON o.packageid = p.packageid
INNER JOIN profile.ClinicLocationInfo cli WITH(NOLOCK) ON cli.LocationId = o.ClinicLocationId
AND cli.FacilityType IN('CLINIC', 'HOSPITAL')
WHERE CAST(p.ShipDTM AS DATE) >= dateadd(month,-1,#StartDate)
AND CAST(p.ShipDTM AS DATE) < dateadd(month,-1,#EndDate)
AND p.isshipped = 1
AND o.IsShipped = 1
AND ISNULL(o.iscanceled, 0) = 0
and not exists (
--and o.ClinicLocationId not in (
SELECT DISTINCT o.ClinicLocationId
FROM [order].package p WITH(NOLOCK)
INNER JOIN [order].[order] o WITH(NOLOCK) ON o.packageid = p.packageid
INNER JOIN profile.ClinicLocationInfo cli WITH(NOLOCK) ON cli.LocationId = o.ClinicLocationId
AND cli.FacilityType IN('CLINIC', 'HOSPITAL')
WHERE CAST(p.ShipDTM AS DATE) >= #StartDate
AND CAST(p.ShipDTM AS DATE) < dateadd(day,-1,#EndDate)
AND p.isshipped = 1
AND o.IsShipped = 1
AND ISNULL(o.iscanceled, 0) = 0
)
For a high level overview, I'm basically trying to find the number of ID's that exist in one set that don't in the next (separated by a 12 month rolling window, offset by 1 month). But for the sake of simplicity, I've written the below that very simply illustrates the exact same symptom:
drop table if exists #T1, #T2
create table #T1 (id int)
create table #T2 (id int)
insert into #T1 (id)
values
(3),
(8)
insert into #T2 (id)
values
(671),
(171)
select id from #T1 where id not in (select id from #T2)
select id from #T1 where not exists (select id from #T2)
My expectation is that both of these would yield the same results, the contents of #T1 (3,8) but instead, I only get those results in the second query by eliminating the NOT. I would assume I'm suffering from a fundamental misunderstanding of how the EXISTS operator works, as up until now I assumed there was no real difference aside from how the scanning occurred and NULL handling.
Where am I going wrong with my expectation?

The query shape...
and o.ClinicLocationId not in (SELECT o.ClinicLocationId ...)
...correlates o.ClinicLocationId to o.ClinicLocationId in the subquery.
When using exists you have to write a correlated subquery to get the same effect:
and not exists (SELECT o1.ClinicLocationId ...
AND o1.ClinicLocationId = o.ClinicLocationId)
Note that the second query requires a different alias in the subquery.

Related

SQL Server 2014: Pairing rows from 2 tables based on values coming from a third one

I have 2 tables that contains typed events over time.
The first table #T1 contains events that always comes before events in the second table #T2.
A third table #E contains records that defines for an event the values that comes in #T1 and #T2 respectively.
Sample data:
CREATE TABLE #T1
(
EventTimestamp DateTime,
VehicleId int,
EventId varchar(50),
EventValue varchar(50)
);
CREATE TABLE #T2
(
EventTimestamp DateTime,
VehicleId int,
EventId varchar(50),
EventValue varchar(50)
);
CREATE TABLE #E
(
EventId varchar(50),
FirstValue int,
LastValue varchar(50)
);
INSERT INTO #T1(EventTimestamp, VehicleId , EventId, EventValue)
VALUES (GETDATE(), 1, 'TwigStatus', '12'),
(GETDATE(), 2, 'SafeProtectEvent', '5')
INSERT INTO #T2(EventTimestamp, VehicleId , EventId, EventValue)
VALUES (DATEADD(second, 30, GETDATE()), 1, 'TwigStatus', '7'),
(DATEADD(second, 30, GETDATE()), 2, 'SafeProtectEvent', '6')
INSERT INTO #E(EventId, FirstValue, LastValue)
VALUES ('TwigStatus', '12', '7'),
('SafeProtectEvent', '5', '6')
DECLARE #EventId varchar(50) = 'TwigStatus';
DECLARE #FirstValue varchar(50) = '12';
DECLARE #LastValue varchar(50) = '7';
WITH ord AS
(
SELECT
first, last,
EventNr = ROW_NUMBER() OVER (ORDER BY first)
FROM
(SELECT
first = t1.EventTimestamp, last = t2.EventTimestamp,
rn = ROW_NUMBER() OVER (PARTITION BY t1.VehicleId ORDER BY t2.EventTimestamp)
FROM
#T1 t1
INNER JOIN
#T2 t2 ON t2.EventTimestamp > t1.EventTimestamp
AND t2.EventValue = #LastValue
WHERE
t1.EventId = #EventId AND t1.EventValue = #FirstValue) ids
WHERE
rn = 1
)
SELECT
t.VehicleId, o.first, o.last, t.EventId, t.EventValue
FROM
#T2 t
INNER JOIN
ord o ON t.EventTimestamp >= o.first
AND t.EventTimestamp <= o.last;
WHERE t.EventId = #EventId;
DROP TABLE #E;
DROP TABLE #T1;
DROP TABLE #T2;
Basically, for a record in table E you see that for eventID 'TwigStatus' the value '12' should come first in table T1 and then '7' should be next in table T2. There is a second event sequence that is defined.
The VehicleId column is the link between the tables T1 and T2.
I need to compute the delay between two matching events in table T1 and T2.
To start simple, I do not use the table E yet, I'm using variables that contains predefined values and I'm returning timestamps.
But the result of the query above;
VehicleId first last EventId EventValue
1 2020-09-15 16:00:37.670 2020-09-15 16:01:07.670 TwigStatus 7
2 2020-09-15 16:00:37.670 2020-09-15 16:01:07.670 SafeProtectEvent 6
Is not what I'm expecting because the EventId 'SafeProtectEvent' Should be filtered out for now.
So I have 2 questions:
How to avoid the second event to show with the actual query.
How to work with the content of the table E and get rid of variables to process event sequences.
EDIT 1: Problem 1 Solved by adding a restriction on the query (see above)

Update/new version below - now allows rows in T1 without matching rows in T2.
Based on discussion on comments below, I have updated this suggestion.
This code replaces everything from the DECLARE #EventId to the end of that SELECT statement.
Logic is as follows - for each row in T1 ...
Determine the time boundaries for that row in T1 (between its EventTimestamp, and the next EventTimestamp in T1 for that vehicle; or 1 day in the future if there is no next event)
Find the matching rows in T2, where 'matching' means a) same VehicleId, b) same EventId, c) EventValue is limited by possibilities in #E, and d) occurs within the time boundaries of T1
Find the first of these rows, if available
Calculate EventDelay as the times between the two timestamps
; WITH t1 AS
(SELECT VehicleId,
EventTimestamp,
EventId,
EventValue,
COALESCE(LEAD(EventTimestamp, 1) OVER (PARTITION BY VehicleID ORDER BY EventTimestamp), DATEADD(day, 1, getdate())) AS NextT1_EventTimeStamp
FROM #T1
),
ord AS
(SELECT t1.VehicleId,
t1.EventTimestamp AS first,
t2.EventTimestamp AS last,
t1.EventId,
t2.EventValue,
ROW_NUMBER() OVER (PARTITION BY t1.VehicleId, t1.EventTimestamp, t1.EventId ORDER BY t2.EventTimestamp) AS rn
FROM t1
LEFT OUTER JOIN #E AS e ON t1.EventId = e.EventId
AND t1.EventValue = e.FirstValue
LEFT OUTER JOIN #T2 AS t2 ON t1.VehicleID = t2.VehicleID
AND t1.EventID = t2.EventID
AND t2.eventId = e.EventId
AND t2.EventValue = e.LastValue
AND t2.EventTimestamp > t1.EventTimestamp
AND t2.EventTimestamp < NextT1_EventTimeStamp
)
SELECT VehicleId, first, last, EventId, EventValue,
DATEDIFF(second, first, last) AS EventDelay
FROM ord
WHERE rn = 1
The ever-growing DB<>fiddle has the latest updates, as well as original posts and previous suggestions.

Loops on SQL Server

I have the following query where I input a date and it give me the result. However, I need to run this for 60 different dates. Instead of running this 1 by 1, is there anyway to automate this so it runs each time on a different date?
IF OBJECT_ID('tempdb..#1') IS NOT NULL DROP TABLE #1
declare #d1 datetime = '2020-02-06'
select distinct [User] into #1
from [X].[dbo].[Table1]
where [status] = 'Success'
and [Date] = #d1;
select count(distinct [User])
from #1
inner join [Y].[dbo].[Table2]
on #1.[User] = [Y].[dbo].[Table2].User
where [Date2] between #d1 and #d1+1
and [Checkname] in ('Check1','Check2')

Loops are slow and generally a bad practice in the context of T-SQL. You can use something like this to get the count of users for a batch of dates:
DROP TABLE IF EXISTS #DataSource;
CREATE TABLE #DataSource
(
[Date] DATETIME
,[UsersCount] INT
);
INSERT INTO #DataSource ([Date])
VALUES ('2020-02-06')
,('2020-02-07')
,('2020-02-08');
IF OBJECT_ID('tempdb..#1') IS NOT NULL DROP TABLE #1
select distinct DS1.[Date]
,DS1.[User]
into #1
from [X].[dbo].[Table1] DS1
INNER JOIN #DataSource DS2
ON DS1.[Date] = DS2.[Date]
where DS1.[status] = 'Success';
select #1.[date]
,count(distinct [User])
from #1
inner join [Y].[dbo].[Table2]
on #1.[User] = [Y].[dbo].[Table2].User
where [Date2] between #1.[date] and #1.[date] + 1
and [Checkname] in ('Check1','Check2')
GROUP BY #1.[date]

First, I want to say that gotqn's answer is a good answer - however, I think there are a few more things in the original code that can be improved - so here is how I would probably do it:
Assuming the dates are consecutive, use a common table expression to calculate the dates using dateadd and row_number.
Then, use another common table expression to get the list of dates and users from table1,
and then select the date and count of distinct users for each date from that common table expression joined to table2:
DECLARE #StartDate Date = '2020-02-06';
WITH Dates AS
(
SELECT TOP (60) DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY ##SPID) -1, #StartDate) As Date
FROM sys.objects
), CTE AS
(
SELECT t1.[User], t1.[Date]
FROM [X].[dbo].[Table1] AS t1
JOIN Dates
ON t1.[Date] = Dates.[Date]
WHERE [status] = 'Success'
)
SELECT cte.[Date], COUNT(DISTINCT [User])
FROM CTE
JOIN [Y].[dbo].[Table2] As t1
ON CTE.[User] = t1.[User]
AND t1.[Date2] >= CTE.[Date]
AND t1.[Date2] < DATEADD(Day, 1, CTE.[Date])
AND [Checkname] IN ('Check1','Check2')
GROUP BY cte.[Date]
If the dates are not consecutive, you can use a table variable to hold the dates instead of calculating them using a common table expression.

Merge records in SQL Server by start and end times

I have Reports table in SQL Server like this:
I need to merge records in this table with same CallNumber and type = Unanswered and difference StartDate one record and EndDate another is less than one.
For example of difference operation please see this:
Result table is like this:
I execute this query for get records that should be merge but I don't know how merge this records.
select t1.CallNumber,t1.id,t1.EndDate,t2.Id,t2.StartDate
from Reports as t1
left join Reports as t2 on t1.CallNumber = t2.CallNumber and t1.type=t2.type
where
t1.EndDate < t2.StartDate
and DATEDIFF(MINUTE,t1.EndDate,t2.StartDate) < 1
and t1.type = 'Unanswered'
group by t1.CallNumber,t1.id,t1.EndDate,t2.Id,t2.StartDate
It would be very helpful if someone could explain solution for query that return result table.

How does this work for you?
declare #t table(ID int
,Number int
,StartDate datetime
,EndDate datetime
,Type nvarchar(50)
);
insert into #t values
(1 ,2024,'20160102 16:40:00','20160102 16:40:15','Unanswered')
,(2 ,2024,'20160102 16:40:16','20160102 16:40:32','Unanswered')
,(3 ,2060,'20160102 16:40:33','20160102 16:40:48','Answered')
,(4 ,2060,'20160102 16:42:00','20160102 16:42:10','Answered')
,(11,2061,'20160102 16:50:00','20160102 16:50:10','Unanswered')
,(12,2062,'20160102 16:50:14','20160102 16:50:24','Unanswered')
,(13,2061,'20160102 16:50:30','20160102 16:50:44','Unanswered');
select *
from #t t1
left join #t t2
on(t1.ID <> t2.ID
and t1.StartDate > t2.StartDate
and datediff(s, t2.EndDate, t1.StartDate) < 60 -- DATEDIFF only records boundaries crossed, so 14:34:59 to 14:35:00 would count as 1 minute despite actually being just 1 second.
and t1.Type = t2.Type
and t1.Type = 'Unanswered'
)
where t2.ID is null;

SQL - Combine rows

I have rows in a table that looks like this:
[date],[name],[duty],[holiday],[hdaypart],[sick],[sdaypart]
2015-04-27, person1, 1,0,NULL,0,NULL
2015-04-27, person1, 0 1,'fd',0,NULL
I would like to combine these rows to:
[date],[name],[duty],[holiday],[hdaypart],[sick],[sdaypart]
2015-04-27, person1, 1,1,'fd',0,NULL
The duty, holiday and sick columns as BIT columns.
Is there way to do this?
The one solution I can come up with is using subqueries, but it consumes a lot of time. A faster solution would be nice.
This is what I have now:
SELECT DISTINCT [name],[date],[region],[cluster]
,CASE WHEN (SELECT SUM(CONVERT(INT,callduty)) FROM planning AS t2
WHERE t1.[Date] = #datum AND t2.[Name] = t1.[name] AND t2.[Date] = t1.[date] ) > 0
THEN 1 ELSE 0 END AS [CallDuty]
,CASE WHEN (SELECT SUM(CONVERT(INT,holiday)) FROM planning AS t2
WHERE t1.[Date] = #datum AND t2.[Name] = t1.[name] AND t2.[Date] = t1.[date] ) > 0
THEN 1 ELSE 0 END AS [Holiday]
FROM planning AS t1
where t1.[Date] = #datum AND t1.[Name] like #naam
group by t1.[date],t1.[name], t1.Region, t1.cluster
order by t1.[name]

You seem to want to group by date and name and select either the maximum or the not null values within each group. MAX aggregate function is suitable for both of these selections:
SELECT [date],[name], MAX([duty]), MAX([holiday]),
MAX([hdaypart]), MAX([sick]), MAX([sdaypart])
FROM mytable
GROUP BY [date],[name]

By looking at your example, I assume that you want to get the maximum values for a specific user.
You could do this using a group by and max
select max([date]),[name],max([duty]),max([holiday]),max([hdaypart]),max([sick]),max([sdaypart])
from yourtable
group by name
This is not really pretty but should perform better than using subqueries.
EDIT:
If you have columns with bit sql types, use
max(cast([bitColumn] as int))
Adding the date column in the group by, as suggested by Giorgos Betsos, the result is
select [date],
[name],
max([duty]),
max([holiday]),
max(cast([hdaypart] as int)),
max(cast([sick] as int)),
max(cast([sdaypart] as int))
from yourtable
group by [date],[name]

declare #t table ([date] date,[name] varchar(10),[duty] varchar(10),[holiday] int,[hdaypart] varchar(10),[sick] int,[sdaypart]
int
)
insert into #t([date],[name],[duty],[holiday],[hdaypart],[sick],[sdaypart])values ('2015-04-27','person1',1,0,NULL,0,NULL),
('2015-04-27','person1',1,0,'fd',0,NULL)
select MAX([date]),MAX([name]),MAX([duty]),MAX([holiday]),MAX([hdaypart]), [sick],[sdaypart] from #t
group by sick,[sdaypart]
OR
select [date],[name],[duty],[holiday],MAX([hdaypart])AS H,[sick],[sdaypart] from #t
group by [date],[name],[duty],[holiday],[sick],[sdaypart]
UNION
select [date],[name],[duty],[holiday],MAX([hdaypart])AS H,[sick],[sdaypart] from #t
group by [date],[name],[duty],[holiday],[sick],[sdaypart]

CREATE TABLE #Combine
(
[date] date,
[name] VARCHAR(10),
[duty] CHAR(1),
[holiday] CHAR(1),
[hdaypart] CHAR(5),
[sick] CHAR(1),
[sdaypart] VARCHAR(10)
)
INSERT INTO #Combine VALUES('2015-04-27', 'person1', '1','0',NULL,'0',NULL),
('2015-04-27', 'person1', '0','1','fd','0',NULL)
SELECT MAX(Date) [date],MAX(name) [name],MAX(Duty) [duty],MAX(holiday) holiday,
MAX(hdaypart) hdaypart,max(sick) sick,max(sdaypart)sdaypart FROM #Combine

Sql query - how to get when a row first got a certain value

I have a table with rows like this:
ID StatusId Date
1 1 2001-01-01
2 1 2001-01-02
3 2 2001-01-03
4 3 2001-01-04
5 1 2001-01-05
6 2 2001-01-06
7 2 2001-01-07
8 1 2001-01-08
9 1 2001-01-09
I need to get the date when the current value of the status was originally changed. For the above example, the last value is 1, and it's changed in row 8, so the result would be 2001-01-08.
How would you do this?
If you need a table to test with, here it is:
DECLARE #Tbl AS TABLE (ID INT, StatusId INT, Date DATETIME)
INSERT INTO #Tbl(ID, StatusId, Date)
SELECT 1,1,'2001-01-01' UNION
SELECT 2,1,'2001-01-02' UNION
SELECT 3,2,'2001-01-03' UNION
SELECT 4,3,'2001-01-04' UNION
SELECT 5,1,'2001-01-05' UNION
SELECT 6,2,'2001-01-06' UNION
SELECT 7,2,'2001-01-07' UNION
SELECT 8,1,'2001-01-08' UNION
SELECT 9,1,'2001-01-09'
SELECT * FROM #Tbl

This one should get you what you're after:
declare #LastStatusID int
declare #LastDate datetime
declare #LastID int
declare #LastChangeID int
/* get last record */
select top 1 #LastStatusID = StatusID, #LastDate = Date, LastID = ID
from #Tbl
order by ID desc
/* get last record with a different status */
select top 1 #LastChangeID = ID
from #Tbl
where ID < #LastID and StatusID <> #LastStatusID
order by ID desc
/* get the first next record - this would get you the last record as well whe it's just been set */
select top 1 Date
from #Tbl
where ID > #LastChangeID
order by ID asc
I haven't included any checking for margin examples when there'd be just one record in the table or multiple of them but all with the same status. You can figure those out yourself.
As a single query
This query requires IDs without gaps and it will get you the last record after a status change and it will also work when there's just one record in the table or multiple of them with the same status (isnull provides the required functionality)
select top 1 Date
from #tbl t1
left join #tbl t2
on (t2.ID = t1.ID - 1)
where (isnull(t2.StatusID, -1) <> t1.StatusID)
order by ID desc
Last where clause changes a null value (when there's no upper record) to -1. If you do have a status with this value, you should change this number to some non-existing status value.

Something like this:
DECLARE #CurrentID INT, #CurrentDate Date
SELECT TOP 1 #CurrentID = ID, #CurrentDate = Date FROM TABLE
ORDER BY Date DESC
SELECT TOP 1 ID, StatusID, Date
FROM Table
WHERE Date < #CurrentDate
AND ID <> #CurrentID
ORDER BY Date DESC

try
select Date
from #Tbl
where StatusId = (
select StatusId
from #Tbl
order by ID desc limit 1)
order by ID desc
limit 1,1
Please check if your database supports limit or not. If not use equivalent of it (e.g. Top).
I have written this as per mysql.

If the table is guaranteed to have one entry per day (as per your sample data), then the following may work
select MAX(t1.Date)
from
#Tbl t1
inner join
#Tbl t2
on
t1.Date = DATEADD(day,1,t2.Date) and
t1.StatusId <> t2.StatusID
Of course, it's possible to further refine this if there are other columns/criteria, of if the value may never have changed at all. Difficult to tell with the small sample size/output example.
Edit 1 If my one entry per day assumption is wrong, then the from clause can be:
from
#Tbl t1
inner join
#Tbl t2
on
t1.Date > t2.Date and
t1.StatusId <> t2.StatusID
left join
#Tbl t_successive
on
t1.Date > t_successive.Date and
t2.Date < t_successive.Date
where
t_successive.ID is null
(Which uses the left join to ensures rows in t1 and t2 don't have any other rows between them)

This is what I came up with finally:
SELECT T1.ID, T1.StatusId, MIN(T3.Date)
FROM #Tbl T1 INNER JOIN #Tbl T3 ON T1.StatusId = T3.StatusId
WHERE T3.Date > (SELECT MAX(Date) FROM #Tbl T2 WHERE T2.StatusId <> T1.StatusId)
AND T1.ID = (SELECT MAX(ID) FROM #Tbl)
GROUP BY T1.ID, T1.StatusId
and it's doing what I needed it to... thanks everyone

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL: NOT IN vs NOT EXISTS strange behavior - sql-server

Related

SQL Server 2014: Pairing rows from 2 tables based on values coming from a third one

Loops on SQL Server

Merge records in SQL Server by start and end times

SQL - Combine rows

Sql query - how to get when a row first got a certain value

Categories

Resources