Check for Overlapping date on Insert/Update - sql-server

I have a table which holds a list of dates and more data for a person. The table should never have any undeleted overlapping rows (Dates overlapping).
Is there a way I can put a check constraint on the table, to ensure that when I update or insert a row, that there's no overlapping details?
Below is a cut down version of my table. It has a deleted flag, and start/end dates. A 'Null' end date means it's ongoing.
I then provide some legal, and some not-so-legal inserts (and why they're legal and illegal).
DECLARE #Test TABLE
(
Id INT NOT NULL IDENTITY(1,1),
PersonID INT NOT NULL,
StartDate DATE NOT NULL,
EndDate DATE NULL,
Deleted BIT NOT NULL
)
INSERT INTO #Test
(PersonId, StartDate, EndDate, Deleted)
SELECT 1, '01-JAN-2015', '15-JAN-2015', 0 UNION ALL -- Valid
SELECT 1, '16-JAN-2015', '20-JAN-2015', 1 UNION ALL -- Valid and deleted
SELECT 1, '18-JAN-2015', NULL, 0 UNION ALL -- Valid
SELECT 2, '01-JAN-2015', NULL, 0 UNION ALL -- Valid.. never ending row.
SELECT 2, '18-JAN-2015', '30-JAN-2015', 0 UNION ALL -- Invalid! Overlaps above record.
SELECT 2, '20-JAN-2015', '30-JAN-2015', 1 UNION ALL -- Valid, as it's deleted (Still overlaps, though)
SELECT 3, '01-JAN-2015', '10-JAN-2015', 0 UNION ALL -- Valid
SELECT 3, '10-JAN-2015', NULL, 0 -- Invalid, as it overlaps the last and first days
SELECT * FROM #Test
I need to make sure that the table doesn't allow overlapping dates for the same person, for undeleted rows.
For the date range check, I will use the "(StartA <= EndB) and (EndA >= StartB)" formula, but unsure how to check this with a constraint, and across multiple rows.
I may need to do it with a Trigger, by checking the inserted.values to the exiting, and somehow, cancel if I find matches?

you cannot use a CHECK Constraint without adding additional columns.
you will have to create a Trigger to check if inserted date ranges are non overlapping. Something like this..
CREATE TRIGGER [dbo].[DateRangeTrigger]
ON [dbo].Test AFTER INSERT, UPDATE
AS
BEGIN
DECLARE #MaxDate DATE = '2999/12/31'
IF EXISTS (SELECT t.StartDate, t.EndDate FROM Test t
Join inserted i
On i.PersonID = t.PersonID
AND i.id <> t.Id
AND(
(i.StartDate > t.StartDate AND i.StartDate < ISNULL(t.EndDate,#MaxDate))
OR (ISNULL(i.EndDate,#MaxDate) < ISNULL(t.EndDate,#MaxDate) AND ISNULL(i.EndDate,#MaxDate) > t.StartDate)
OR (i.StartDate < t.StartDate AND ISNULL(i.EndDate,#MaxDate) > ISNULL(t.EndDate,#MaxDate))
)
WHERE t.Deleted = 0 AND i.Deleted = 0
)
BEGIN
RAISERROR ('Inserted date was within invalid range', 16, 1)
IF (##TRANCOUNT>0)
ROLLBACK
END
END
You can refer to one of these threads for more information
Enforcing unique date range fields in SQL Server 2008
Unique date range fields in SQL Server 2008

Here's a trigger-based approach:
CREATE TRIGGER [dbo].[trigPersonnel_PreventOverlaps]
ON [dbo].[Personnel]
AFTER INSERT, UPDATE
AS
BEGIN
IF EXISTS(
SELECT * FROM DateRange p
INNER JOIN inserted i ON i.PersonID = p.PersonID
AND i.Id != p.Id AND i.Deleted = 0
AND (
(p.StartDate <= i.StartDate
AND (i.StartDate <= p.EndDate OR p.EndDate IS NULL))
OR (p.StartDate <= i.EndDate
AND (i.EndDate <= p.EndDate OR p.EndDate IS NULL))
)
WHERE p.Deleted = 0
)
--RAISEERROR if you want
ROLLBACK
END
Note - it will roll back the whole transaction, so you'll need to perform inserts individually to ensure good ones don't get thrown out.
If you need something to comb through a bulk insert and pick out the bad ones, you'll need something more complex.

Related

How to optimize the insert query from multiple tables?

I have 2 tables, Table 1 (temp table in SP) has around 400 records. Table 2 has around 30,550,284 records.
I need to run a loop on table 1 for each record and get the top 1 from table 2 based on a few conditions (where clause) and then order by modified date in decreasing order.
There is an index on the modified date.
declare #iPos int;
declare #iCount int;
select #iCount = count(*) from Table1;
set #iPos = 1;
declare #Table2 table(......)
declare #timestampLocal2 datetime
while (#iPos <= #iCount)
BEGIN
select #val1 = Col1, #timestampLocal = TimeStamp
from #Table1 where ID = #iPos
set #timestampLocal2 = DATEADD(HH,-96,#timestampLocal)
INSERT INTO #Temp3 ( .... ),....)
select top 1 r.LastModified, r.[Col2], r.Col3, #iPos
from Table2 (NOLOCK) r
where Col1 =#val1 and
r.LastModified <= #timestampLocal
and r.LastModified >= #timestampLocal2
and (r.Col2 is not null and r.Col3 is not null)
order by LastModified desc
SELECT #iPos = #iPos + 1;
END
This query is very slow.
I have also thought to archive table 2, But I want to keep that as the second option for now.
Do I really need to add an index on the columns which are involved in the where clause?
So my question is, in terms of performance is there a better way to do this?
I believe a CROSS APPLY or OUTER APPLY may do the trick. These can be thought of as being similar to INNER JOIN or LEFT JOIN, except that they allow you to reference a subquery having more complex conditions such as TOP 1 and ORDER BY. Ideal for cases like this.
-- INSERT INTO #Temp3 ( .... )
select r.LastModified, r.[Col2], r.Col3, t1.ID
from #Table1 t1
cross apply (
SELECT TOP 1 r.*
from Table2 r -- Don't use (NOLOCK)
where r.Col1 = t.Col1
and r.LastModified <= t1.[TimeStamp]
and r.LastModified >= DATEADD(HH,-96,t1.[TimeStamp])
and (r.Col2 is not null and r.Col3 is not null)
order by r.LastModified desc
) r
For efficiency, I recommend an index on Table2(Col1,LastModified) or as an absolute minimum, an index on Table2(Col1).
I would strongly discourage the use of (NOLOCK) or 'READ UNCOMMITTED` in queries that update the database (like the insert into table3 above). While the query may appear to work most of the time, seemingly random occurrences of missing or duplicate rows may result.
Do you need to handle cases where no matching Table2 record is found? The above will quietly ignore such cases. Changing the CROSS APPLY to an OUTER APPLY together with logic to handle null r.xxx values could be what you need.

WHERE clause on individual rows

I'm trying to write a query that will select rows from a table but I'm struggling a little with there WHERE clause.
I have a list of C# objects that has an ID and a Date. I want to find rows in my database where the ID is equal to one of the ids in my list of objects, but at the same time, the Date that comes with the specific ID has to be between a ValidFrom and a ValidTo field in the Database.
I already a query that will work but it is not a very pretty solution:
SELECT *
FROM [dbo].[Employees] AS emp
WHERE emp.IsDeleted = 0
AND (emp.EmployeeId = 1
AND (emp.ValidFrom <= '2017-05-01')
AND (emp.ValidTo > '2017-05-01'))
OR (emp.EmployeeId = 2
AND (emp.ValidFrom <= '2018-05-01')
AND (emp.ValidTo > '2018-05-01'))
And then I'd proceed to add the 'OR' statement from there.
Is there a more optimal way for me to accomplish this?
You can populate the filtering criteria in a separate table:
CREATE TABLE #RecordsTobeFiltered
(
EmployeeId INT
,ValidFrom DATE
,ValidTo DATE
);
INSERT INTO #RecordsTobeFiltered (EmployeeId, ValidFrom, ValidTo)
VALUES (1, '2017-05-01', '2017-05-01')
,(2, '2018-05-01', '2018-05-01')
--- and many records as you need
SELECT *
FROM [dbo].[Employees] AS emp
INNER JOIN #RecordsTobeFiltered F
ON emp.EmployeeId = f.EmployeeId
and emp.ValidFrom <= f.ValidFrom
and emp.validTo > f.ValidTo
WHERE emp.IsDeleted = 0;

Count(*) for View returning different results on SQL Server

I am working on an ETL optimization problem and that requires creating a temp table that could be merged with the final table. Currently I have a couple Views that are used to load the final table and that is taking a lot of time. I tried to take the SQL logic from the view and created a temp table and noticed that the values in the temp table do not match the values in the final table. To look deeper I was running count(*) on the view couple of times and noticed that the result for total row count is different for every run by about 10/15 rows give or take. The view has 16 columns from 9 tables which load only once a day. So the time when I run the count(*) the underlying data does not change but the result of the count from the view does change.
This is on a SQL Server 2016 server. I have tried looking into the View logic and nothing stands out as odd. I have tried doing a count(*) on the tables that loads this view and the counts for the tables do not change. I have also tried to create 2 column table from the view logic to simplify the problem and tried an EXCEPT command and that still yields about 20 rows of inconsistent values between the 2 column table created from the same exact view logic.
Here is a reproduction of the VIEW definition that has the row count inconsistency
USE [PROD]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE VIEW Base_View
AS
select
concat(x, y, z)feild1
,*
,ROW_NUMBER() OVER(PARTITION BY a,b ORDER BY some_Date) AS rec_num
,count(a) OVER(PARTITION BY a) AS rec_total
from (
SELECT
case when RESULT='stored value' and e.code is not null then 'x' else '' end x
,case when RESULT='stored value 2' and r.l_id is not null then 'y' else '' end y
,case when RESULT in ('stored value 3','stored value 4') and t.amount is not null then 'z' else '' end z
,case when
CASE WHEN
(m.status = 'stored value 4' OR m.status = 'stored value 5')
AND m.bal < 0
THEN
CASE WHEN DATEDIFF(day,m.due,m.SNAP_DATE) < 0
THEN 0
ELSE DATEDIFF(day,m.due,m.SNAP_DATE)
END
ELSE 0
END=0 AND w.W_ID is null AND m.status<>'stored value 5'
then case
when RESULT in ('stored value 5','stored value 4')
then case when isnull(AMOUNT,0)<>0
then 'abc'
else 'def' end
else 'abc' end
else 'def'
end imp_feild
,result
,es.emp_id
,concat(es.fname,' ',es.lname)task_emp
,concat(e.fname,' ',e.lname)ext_emp
,case when RESULT ='stored value' then t.P_STATUS else null end p_status
,t.CREATE_DATE
,t.l_key
,t.l_id
,m.status
,cast(w.wodate as date)wo_date
,rm.balance refi_balance,rnl.LOAN_key refi_loan,r.effective refi_effective
,case trancode when 'ext' then m.payment else null end ext_amount,e.entered ext_entered,e.effective ext_effective
FROM
(
select t0.*,ROW_NUMBER() OVER(PARTITION BY t0.some_KEY,cast(t0.CREATE_DATE as date),t0.output
ORDER BY t0.some_KEY,cast(t0.CREATE_DATE as date),t0.output ) AS SEQ_NUM
from base_table_1 t0
left join base_table_2 e0
on t0.c_e_key=e0.e_key
where t0.active_rec_ind='Y'
and t0.output in (d,e,f,g)
and (t0.output2 in (j,k)
or ISNULL(e0.some_KEY,'h') in ('u','w'))
) t
join
base_table_3 l
on t.loan_sf_id=l.loan_sf_id
and t.active_rec_ind='Y'
join base_table_4 m
on
t.SOME_DATE=m.SNAP_DATE
and t.L_ID=m.L_ID
left
join base_table_5 es
on t.c_emp_key=es.emp_key
left
join base_table_6 r
on l.l_id=r.l_old_id
and r.entered between dateadd(day,0,cast(t.CREATE_DATE as date)) and dateadd(day,0,t.SOME_DATE)
left
join base_table_7 w
on l.l_id=w.l_id
and w.wodate between cast(t.CREATE_DATE_ETZ as date) and dateadd(day,0,t.SOME_DATE)
left
join base_table_8 wl
on w.l_id=wl.l_id
left
join base_table_8 rnl
on r.l_new_id=rnl.l_id
left
join base_table_8 rol
on r.l_old_id=rol.l_id
left
join base_table_4 rm
on
dateadd(day,-1,r.effective)=rm.SNAP_DATE
and rol.L_ID=rm.L_ID
left
join
(select e0.*,ew.value_1,ew.new_key,ROW_NUMBER() OVER(PARTITION BY e0.L_ID,e0.ENT ORDER BY e0.L_ID,e0.ENT) AS SEQ_NUM
from base_table_9 e0
join base_table_5 ew
on e0.EMP_ID=ew.EMP_ID
where e0.code='a'
) e
on l.sid=e.sid
and e.code='a' and RESULT='stored value 5'
and e.entered between cast(t.CREATE_DATE as date) and dateadd(day,0,t.HOLD_DATE)
AND e.SEQ_NUM=t.SEQ_NUM
and ((isnumeric(e.roll_key)=1 and isnumeric(es.roll_key)=1 and e.roll_key=es.roll_key)
or ((isnumeric(e.roll_key)=0 or isnumeric(es.roll_key)=0) and e.FNAME+e.LNAME=es.FNAME+es.LNAME))
where t.RESULT in ('abc','def')
and cast(t.CREATE_DATE as date) between cast(dateadd(month,-12,getdate()) as date) and cast(getdate() as date)
and (AGENT in ('lmn', 'pqr')
or ISNULL(es.VKEY,'stored value 8') in ('xx','yy','zz'))
)x
where imp_feild='abc'
and concat(x, y, z)<>''
or imp_feild='def'
GO
Expected result is that it should return a consistent number for the row count and that hopefully should solve the inconsistent values problem on the temp table.
Your query has between cast(dateadd(month,-12,getdate()) as date) and cast(getdate() as date) near the bottom. Of course the result of getdate() will be different with each execution and each call to getdate(). That will affect the result.
BTW, having * in your SELECT list is not a good idea. You should only return the columns needed. It makes the view results vulnerable to changes in the underlying tables.
There are a few other things that wouldn't pass code review where I work but that's kinda OT, I think.
This is too long for a comment. Using * in a view is a very bad idea. Not only does the view NOT update (unless you execute sp_refreshview) when you change the base table you can actually get some very interesting things happening.
Check this out as an example of just how bad this can be.
create table ViewExample (Col1 int, Col2 int)
go
create view ViewExampleView as select * from ViewExample
go
insert ViewExample select 1, 2
go
select * from ViewExampleView --obviously we get just a single column
alter table ViewExample add Col3 int --add a new column to the table, surely the view will pick this up?
go
insert ViewExample select 3, 4, 5 --insert a new row with data in all three columns
go
select * from ViewExampleView --what??? The view says select * but we only get Col1 and Col2?
alter table ViewExample drop column Col2 --Oops we decide to drop this column because we don't need it anymore
select * from ViewExampleView --What in the world? Col2 doesn't exist in the table, why is it in the view? And what the heck is going on here. The data from Col3 is now moved to Col2
drop view ViewExampleView
drop table ViewExample
Notice how in the last select from the view that the data from Col3 is being displayed in Col2. If this doesn't convince you to stop using * in views (and pretty much everywhere) I don't know what will.

SQL Server contiguous dates - summarizing multiple rows into contiguous start and end date rows without CTE's, loops,...s

Is it possible to write an sql query that will summarize rows with start and end dates into rows that have contiguous start and end dates?
The constraint is that it has to be regular sql, i.e. no CTE's, loops and the like as a third party tool is used that only allows an sql statement to start with Select.
e.g.:
ID StartDate EndDate
1001, Jan-1-2018, Jan-04-2018
1002, Jan-5-2018, Jan-13-2018
1003, Jan-14-2018, Jan-18-2018
1004, Jan-25-2018, Feb-05-2018
The required output needs to be:
Jan-1-2018, Jan-18-2018
Jan-25-2018, Feb-05-2018
Thank you
You can take advantage of both window functions and the use of a concept called gaps-and-islands. In your case, contiguous dates would be the island, and the the gaps are self explanatory.
I wrote the answer below in a verbose way to help make it clear what the query is doing, but it could most likely be written in a different way that is more concise. Please see my comments in the answer explaining what each step (sub-query) does.
--Determine Final output
select min(c.StartDate) as StartDate
, max(c.EndDate) as EndDate
from (
--Assign a number to each group of Contiguous Records
select b.ID
, b.StartDate
, b.EndDate
, b.EndDatePrev
, b.IslandBegin
, sum(b.IslandBegin) over (order by b.ID asc) as IslandNbr
from (
--Determine if its Contiguous (IslandBegin = 1, means its not Contiguous with previous record)
select a.ID
, a.StartDate
, a.EndDate
, a.EndDatePrev
, case when a.EndDatePrev is NULL then 1
when datediff(d, a.EndDatePrev, a.StartDate) > 1 then 1
else 0
end as IslandBegin
from (
--Determine Prev End Date
select tt.ID
, tt.StartDate
, tt.EndDate
, lag(tt.EndDate, 1, NULL) over (order by tt.ID asc) as EndDatePrev
from dbo.Table_Name as tt
) as a
) as b
) as c
group by c.IslandNbr
order by c.IslandNbr
I hope following SQL query can help you to identify gaps and covered dates for given case
I did not use a CTE expression of a dates table function, etc
On the other hand, I used a numbers table using master..spt_values to generate the dates table as the main table of a LEFT join
You can create a numbers table or a dates table if it does not fit to your requirements
In the query, to catch changes between borders I used SQL LAG() function which enables me to compare with previous value of a column in a sorted list
select
max(startdate) as startdate,
max(enddate) as enddate
from (
select
date,
case when exist = 1 then date else null end as startdate,
case when exist = 0 then dateadd(d,-1,date) else null end as enddate,
( row_number() over (order by date) + 1) / 2 as rn
from (
select date, exist, case when exist <> (lag(exist,1,'') over (order by date)) then 1 else 0 end as changed
from (
select
d.date,
case when exists (select * from Periods where d.date between startdate and enddate) then 1 else 0 end as exist
from (
SELECT dateadd(dd,number,'20180101') date
FROM master..spt_values
WHERE Type = 'P' and dateadd(dd,number,'20180101') <= '20180228'
) d
) cte
) tbl
where changed = 1
) dates
group by rn
Here is the result

Fastest way to check the records if exists in the SQL table

I have hundreds of thousands of records in my SQL table. If between particular data, if no record is present, it takes so much time to inform us. Currently, I am using this query to check if record exists.
select TOP 1 1 AS getRowCount
from MYTable
where ID IN ('3','5','2','4','1')
AND (
datewithTime >= '2015-01-01 07:00:00'
AND datewithTime < '2016-01-01 07:00:00'
)
In the above query, I am getting the record of 1 year, but records are not present in this time limit. But it is taking too much time to respond. Is there any other way that can show if data exists in the table for this particular time interval ?
Will LINQ perform better ?
First, You should use the EXISTS statement instead of selecting top 1:
SET #getRowCount = EXISTS(select 1
from MYTable
where ID IN ('3','5','2','4','1')
AND datewithTime >= '2015-01-01 07:00:00'
AND datewithTime < '2016-01-01 07:00:00'
)
Second, you should check the execution plan to see if you can improve performance by adding indices or altering existing indices.
update
Sorry, I wasn't paying enough attention to what I'm writing.
Exists returns a boolean value, but sql server does not have a boolean data type, this is why you get the incorrect syntax error.
Here is the correct syntax:
DECLARE #getRowCount bit = 0
IF EXISTS(select 1
from MYTable
where ID IN ('3','5','2','4','1')
AND datewithTime >= '2015-01-01 07:00:00'
AND datewithTime < '2016-01-01 07:00:00'
) SET #getRowCount = 1
SELECT #getRowCount
First of all add indexes to your table:
ALTER TABLE TableName ADD CONSTRAINT [PK_TableName] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
GO
CREATE NONCLUSTERED INDEX [IX_TableName_datewithTime ] ON TableName
(
datewithTime ASC
)
GO
Then change your query to this:
if exists(select * from TableName
where ID in ('3','5','2','4','1') and
datewithTime >= '2015-01-01 07:00:00' and
datewithTime < '2016-01-01 07:00:00')
select 1 as DataExists
else
select 0 as DataExists

Resources