Purge job optimization - sql-server

SQL Server 2008 R2 Enterprise
I have a database with 3 tables that I am keeping a retention time of 15 days. This is a logging database that is very active and about 500 GB in size and eats about 30GB a day unless purged. I can't seem to get caught up on one of the tables and I am falling behind. This table has 220 million rows and it needs to purge around 10-12 million rows nightly. I am currently at 30 million rows needed to purge. I can only run this purge at night due to the volume of incoming inserts competing for table locks. I have confirmed that everything is indexed correctly and have run Brent Ozars sp_Blitz_Index just to confirm that. Is there any way to optimize what I am doing below? I am running the same purge steps for each table.
Drop and Create 3 purge tables: Purge_Log, Purge_SLogHeader and Purge_SLogMessage.
2.Insert rows into the purge tables (Takes 5 minutes each table):
Insert Into Purge_Log
Select ID from ServiceLog
where startTime < dateadd (day, -15, getdate() )
--****************************************************
Insert into Purge_SLogMessage
select serviceLogId from ServiceLogMessage
where serviceLogId in ( select id from
ServiceLog
where startTime < dateadd (day, -15, getdate() ))
--****************************************************
Insert into Purge_SLogHeader
Select serviceLogId from ServiceLogHeader
where serviceLogId in ( select id from
ServiceLog
where startTime < dateadd (day, -15, getdate() ))
After that is inserted, then I run the following with differences for each table:
SET ROWCOUNT 1000
delete_more:
delete from ServiceLog
where Id in ( select Id from Purge_Log)
IF ##ROWCOUNT > 0 GOTO delete_more
SET ROWCOUNT 0
Basically does anyone see a way that I can make this procedure run faster or have a different way to go about it. I've made the queries as simple as possible and with only one subquery. I've used a join and the execution query plan says the time is the same to complete it that way. Any guidance would be appreciated.

You can use this technique for all the tables, collect IDs first in temporary table to avoid scanning original table again and again in huge data. I hope it will work perfectly for you all the tables:
DECLARE #del_query VARCHAR(MAX)
/*
Taking IDs from ServiceLog table instead of Purge_Log because Purge_Log may have more data than expected because of frequent purging
*/
IF OBJECT_ID('tempdb..#tmp_log_ids') IS NOT NULL DROP TABLE #tmp_log_ids
SELECT ID INTO #tmp_log_ids FROM ServiceLog WHERE startTime < DATEADD(DAY, -15, GETDATE())
SET #del_query ='
DELETE TOP(100000) sl
FROM ServiceLog sl
INNER JOIN #tmp_log_ids t ON t.id = s1.id'
WHILE 1 = 1
BEGIN
EXEC(#del_query + ' option(maxdop 5) ')
IF ##rowcount < 100000 BREAK;
END
SET #del_query ='
DELETE TOP(100000) sl
FROM ServiceLogMessage sl
INNER JOIN #tmp_log_ids t ON t.id = s1.serviceLogId'
WHILE 1 = 1
BEGIN
EXEC(#del_query + ' option(maxdop 5) ')
IF ##rowcount < 100000 BREAK;
END
SET #del_query ='
DELETE TOP(100000) sl
FROM ServiceLogHeader sl
INNER JOIN #tmp_log_ids t ON t.id = s1.serviceLogId'
WHILE 1 = 1
BEGIN
EXEC(#del_query + ' option(maxdop 5) ')
IF ##rowcount < 100000 BREAK;
END

Related

Need to get the min startdate and max enddate when there are no breaks in months or changes in ownership

I have a table that contains account ownership with startdates and enddates by account, however, some accounts have duplications and some have rules that overlap date ranges. I need a clean result set showing account, owner, startdate and enddate with no duplications or overlaps.
The source table look like this:
accountnumber
startdate
enddate
owner
1
3/1/2012
6/30/2012
john
1
3/1/2012
6/30/2012
john
1
5/31/2012
7/31/2015
john
2
5/1/2012
8/1/2012
bill
2
8/2/2012
10/31/2012
bill
2
12/1/2012
12/31/2012
joe
2
1/1/2013
12/31/2025
bill
I need the results to read like
accountnumber
startdate
enddate
owner
1
3/1/2012
7/31/2015
john
2
5/1/2012
10/31/2012
bill
2
12/1/2012
12/31/2012
joe
2
1/1/2013
12/31/2025
bill
Any help is much appreciated. I'm very much a novice when it comes to SQL.
Select Distinct removes my duplicates, but I still end up with multiple overlapping date ranges.
I don't know what version of sql server we are using. It is a connector within a BI application called Sisense, and doesn't really say.
This is my select statement so far:
select distinct
r.accountnumber,
r.startdate,
r.enddate,
a.employeename Owner
from "dbo"."ruleset" r
left join "dbo"."rule" a on r.id = a.rulesetid
where
a.roleid = '1' and
r.isapproved = 'true'
The table structure is a bit interesting, and while there may be a better way to figure this out with less code (i.e., set based); this does the trick. Here's my explanation along with my code.
Thought Process: I needed to order the rows in order of AccountNumber and Owner and identify whenever either of these change, as that would mark a new "term"; additionally I would need a way of marking the beginnings of each of these "terms". For the former I used ROW_NUMBER, and for the latter I used LAG. These records , along with with 2 new fields are inserted into a temp table.
Having these 2 pieces of information allowed for me to loop through the rows using a WHILE loop, keeping track of the current row, as well as the most recent beginning of a term. I update the first record of each term with the latest end date (assuming that you don't have earlier end dates for later start dates), and once we're done we select just the records which are marked as being the new term, and we get the result set which you asked for.
Links to documentation.
RowNumber()
Lag
While
Code example:
DECLARE #RowNumber INTEGER
,#BeginTerm INTEGER
,#EndDate DATE;
DROP TABLE IF EXISTS #OwnershipChange;
SELECT
r.accountNumber
,r.startDate
,r.endDate
,a.employeename AS [owner]
,ROW_NUMBER()OVER(ORDER BY r.accountNumber, r.StartDate) RowNumber
,0 AS Processed
,CASE
WHEN a.employeename = LAG(a.employeename,1,NULL) OVER(ORDER BY r.accountNumber)
AND r.accountNumber = LAG(r.accountNumber,1,NULL)OVER(ORDER BY r.accountNumber)
THEN 0
ELSE 1
END AS NewOwnership
INTO #OwnershipChange
FROM dbo.ruleset r
LEFT OUTER JOIN dbo.rule a ON r.id = a.rulesetid
WHERE a.roleid = '1'
AND r.isapproved = 'true';
WHILE EXISTS (
SELECT 1/0
FROM #OwnershipChange
WHERE Processed = 0
)
BEGIN
SET #RowNumber = (
SELECT TOP 1 RowNumber
FROM #OwnershipChange
WHERE Processed = 0
ORDER BY RowNumber
);
SET #BeginTerm = (
SELECT
CASE
WHEN NewOwnership = 1 THEN #RowNumber
ELSE #BeginTerm
END
FROM #OwnershipChange
WHERE RowNumber = #RowNumber
);
SET #EndDate = (
SELECT endDate
FROM #OwnershipChange
WHERE RowNumber = #RowNumber
);
UPDATE #OwnershipChange
SET endDate = #EndDate
WHERE RowNumber = #BeginTerm;
UPDATE #OwnershipChange
SET Processed = 1
WHERE RowNumber = #RowNumber;
END;
SELECT
accountNumber
,startDate
,endDate
,[owner]
FROM #OwnershipChange
WHERE NewOwnership = 1;

Stored procedure execution is too slow

I have a stored procedure which refer multiple tables (four to be specific i.e RefurbRef, ActivationDetailRefurb, ActivationDetailReplaced, ReplacedData) with of approx 1 lac of data on each table.
I need to bind the data from the stored procedure to UI on front end. When I tried executing the stored procedure on my SQL Server 2008 it took almost 20 minutes to execute and fetch the result. There's no way user's going to wait for that long gazing at the "please wait loading" user interface.
This is the procedure:
CREATE procedure [dbo].[uspLotFailureDetail]
#fromDate varchar(50),
#toDate varchar(50),
#vendorName varchar(50),
#modelName varchar(50)
AS
BEGIN
select
d.LOTQty,
ApprovedQty = count(distinct d.SerialNUMBER),
d.DispatchDate,
Installed = count(a.SerialNumber) + count(r.SerialNumber),
DOA = sum(case when datediff(day, coalesce(a.ActivationDate,r.ActivationDate), f.RecordDate) between 0 and 10 then 1 else 0 end),
Bounce = sum(case when datediff(day, coalesce(a.ActivationDate,r.ActivationDate), f.RecordDate) between 11 and 180 then 1 else 0 end)
from
RefurbRef d
left join
ActivationDetailRefurb a on d.SerialNUMBER= a.SerialNumber
and d.DispatchDate <= a.ActivationDate
and d.LOTQty = a.LOTQty
left join
ActivationDetailReplaced r on d.SerialNUMBER= r.SerialNumber
and d.DispatchDate <= r.ActivationDate
and d.LOTQty = r.LotQty
and (a.ActivationDate is null or a.ActivationDate <= d.DispatchDate)
left join
ReplacedData f on f.OldSerialNumber = (coalesce (a.SerialNumber, r.SerialNumber))
and f.RecordDate >= (coalesce (a.ActivationDate, r.ActivationDate))
where
d.DispatchDate between #fromDate and #toDate
and d.VendorName = #vendorName
and d.Model = #modelName
group by
d.LOTQty, d.DispatchDate
END
There are two types of results the procedure extracts, Results based on Vendor and on Model. However if result is extracted based on Vendor i.e by using only #fromDate, #toDate and #Vendor, procedure takes less than 2 minutes to execute and get the result. But when all the four variables are used like in the procedure above it takes not less than 20 minutes to execute.
Is there any way I could optimize the query to increase the performance of the procedure?
Thanks in advance
Given the information provided I would look at the RefurbRef.model number to see if there is a covering index on this field. I would bet there is not based on it jumping to 20 minutes once you added this criteria. Additionally, I would change your variables to dates from varchar.
#fromDate Date,
#toDate Date,
Hope this helps,
Jason

Check for Overlapping date on Insert/Update

I have a table which holds a list of dates and more data for a person. The table should never have any undeleted overlapping rows (Dates overlapping).
Is there a way I can put a check constraint on the table, to ensure that when I update or insert a row, that there's no overlapping details?
Below is a cut down version of my table. It has a deleted flag, and start/end dates. A 'Null' end date means it's ongoing.
I then provide some legal, and some not-so-legal inserts (and why they're legal and illegal).
DECLARE #Test TABLE
(
Id INT NOT NULL IDENTITY(1,1),
PersonID INT NOT NULL,
StartDate DATE NOT NULL,
EndDate DATE NULL,
Deleted BIT NOT NULL
)
INSERT INTO #Test
(PersonId, StartDate, EndDate, Deleted)
SELECT 1, '01-JAN-2015', '15-JAN-2015', 0 UNION ALL -- Valid
SELECT 1, '16-JAN-2015', '20-JAN-2015', 1 UNION ALL -- Valid and deleted
SELECT 1, '18-JAN-2015', NULL, 0 UNION ALL -- Valid
SELECT 2, '01-JAN-2015', NULL, 0 UNION ALL -- Valid.. never ending row.
SELECT 2, '18-JAN-2015', '30-JAN-2015', 0 UNION ALL -- Invalid! Overlaps above record.
SELECT 2, '20-JAN-2015', '30-JAN-2015', 1 UNION ALL -- Valid, as it's deleted (Still overlaps, though)
SELECT 3, '01-JAN-2015', '10-JAN-2015', 0 UNION ALL -- Valid
SELECT 3, '10-JAN-2015', NULL, 0 -- Invalid, as it overlaps the last and first days
SELECT * FROM #Test
I need to make sure that the table doesn't allow overlapping dates for the same person, for undeleted rows.
For the date range check, I will use the "(StartA <= EndB) and (EndA >= StartB)" formula, but unsure how to check this with a constraint, and across multiple rows.
I may need to do it with a Trigger, by checking the inserted.values to the exiting, and somehow, cancel if I find matches?
you cannot use a CHECK Constraint without adding additional columns.
you will have to create a Trigger to check if inserted date ranges are non overlapping. Something like this..
CREATE TRIGGER [dbo].[DateRangeTrigger]
ON [dbo].Test AFTER INSERT, UPDATE
AS
BEGIN
DECLARE #MaxDate DATE = '2999/12/31'
IF EXISTS (SELECT t.StartDate, t.EndDate FROM Test t
Join inserted i
On i.PersonID = t.PersonID
AND i.id <> t.Id
AND(
(i.StartDate > t.StartDate AND i.StartDate < ISNULL(t.EndDate,#MaxDate))
OR (ISNULL(i.EndDate,#MaxDate) < ISNULL(t.EndDate,#MaxDate) AND ISNULL(i.EndDate,#MaxDate) > t.StartDate)
OR (i.StartDate < t.StartDate AND ISNULL(i.EndDate,#MaxDate) > ISNULL(t.EndDate,#MaxDate))
)
WHERE t.Deleted = 0 AND i.Deleted = 0
)
BEGIN
RAISERROR ('Inserted date was within invalid range', 16, 1)
IF (##TRANCOUNT>0)
ROLLBACK
END
END
You can refer to one of these threads for more information
Enforcing unique date range fields in SQL Server 2008
Unique date range fields in SQL Server 2008
Here's a trigger-based approach:
CREATE TRIGGER [dbo].[trigPersonnel_PreventOverlaps]
ON [dbo].[Personnel]
AFTER INSERT, UPDATE
AS
BEGIN
IF EXISTS(
SELECT * FROM DateRange p
INNER JOIN inserted i ON i.PersonID = p.PersonID
AND i.Id != p.Id AND i.Deleted = 0
AND (
(p.StartDate <= i.StartDate
AND (i.StartDate <= p.EndDate OR p.EndDate IS NULL))
OR (p.StartDate <= i.EndDate
AND (i.EndDate <= p.EndDate OR p.EndDate IS NULL))
)
WHERE p.Deleted = 0
)
--RAISEERROR if you want
ROLLBACK
END
Note - it will roll back the whole transaction, so you'll need to perform inserts individually to ensure good ones don't get thrown out.
If you need something to comb through a bulk insert and pick out the bad ones, you'll need something more complex.

Sum of 2 values in a table

I have 2 Tables like below
Table - 1
Bank_Name
Bank_ACNO
Bank_Branch
Bank_Balance
Table - 2
Emp_ID
Amount_Paid
Table-1 contains unique records for each Bank ACNO. But Table 2 contain Multiple records. Now i want to update Table - 1 (Bank_Balance) With Sum(Table-1.Bank_Balance + Amount_Paid) where Table-1.Bank_ACNO=Table-2.Emp_ID.
I tried the below Query which did not Work.
UPDATE Bank_Master
SET Bank_Balance = ( Bank_Master.Bank_Balance
+ Order_Archieve_Temp.Amount_Paid )
OUTER JOIN Order_Archieve_Temp
ON Bank_Balance.Bank_ACNO=Order_Archieve_Temp.Emp_ID)
Here is the SQLFiddel Demo
Below is the Update Query which you can try :
Update T1
set T1.Bank_Balance = t1.Bank_Balance + t2.Amount_Paid
FROM TABLE1 T1,
(select Emp_ID,sum(Amount_Paid) as Amount_Paid
from Table2
group by Emp_ID ) as T2
WHERE T1.Bank_ACNO = T2.Emp_ID
If that's going to remain your table design, you better keep your database under really tight control: in most such circumstances, applications that have to determine a balance will do so by calculating it on-the-fly from some known and well-controlled state (say, from the last statement date) as a sum of that balance, and all the transactions that have occurred after then.
The current design appears vulnerable to miscalculation of the balance, and continued persistence of that error into the future.
Are there any possible concurrency issues here (could multiple parties possibly be executing this same statement from different connections?). What is your transaction isolation level?
Try this query:
BEGIN TRAN;
UPDATE t1
SET Bank_Balance = t1.Bank_Balance + ISNULL(x.Total_Amount_Paid,0)
-- or
-- SET Bank_Balance = ISNULL(t1.Bank_Balance,0) + ISNULL(x.Total_Amount_Paid,0)
-- or
-- SET Bank_Balance = NULLIF(ISNULL(t1.Bank_Balance,0) + ISNULL(x.Total_Amount_Paid,0), 0)
FROM dbo.Table1 t1
OUTER APPLY
(
SELECT SUM(t2.Amount_Paid) AS Total_Amount_Paid
FROM dbo.Table2 t2
WHERE t1.Bank_ACNO = t2.Emp_ID
) x
ROLLBACK
-- COMMIT

SQL Query runs forever - SQL Server 2008

I have two Tables in two different databases
Database1 - Table1
Database2 - Table2
Table1 Columns: NimID,IDDate,Station
Table2 Columns: XilID,XilDate
Table1 Table2
NimID IDDate Station XilID XilDate
234 2011-04-21 HYD 234 2011-04-21
237 2011-04-21 CHN 208 2011-04-21
208 2011-04-21 HYD 209 2011-04-15
209 2011-04-15 DEL 218 2011-05-28
212 2011-03-11
I want to find out how many IDs in Table1 are not in Table2 where IDDate=XilDate='2011-04-21' group by Table1.Station .
I made the query below
select x.Station as Station,
count(distinct x.NimID) as Difference
from (
select a.NimID,
a.IDDate,
a.Station
from database1.dbo.table1 a
where left(cast(a.Date as date),11)='2011-04-21'
) as X, (
select b.XilID,
b.XILDate
from database2.dbo.Table2 b
where b.XilDate='2011-04-21'
) as Y
where x.NimID not in (y.XilID)
group by x.Station
But this query runs forever..
Please remember the tables are from different databases located on same server and Table1 contains 10,000,000 records and Table2 contains around 13,000,000 records
Please correct my query if wrong or suggest me the faster way
Thanks
DECLARE #date datetime;
SET #date = '20110421';
SELECT
Station,
Diff = COUNT(*)
FROM (
SELECT
a.NimID,
a.IDDate,
a.Station
FROM database1.dbo.table1 a
LEFT JOIN database2.dbo.table2 b ON a.NimID = b.XilID AND b.XilDate = #date
WHERE b.XilID IS NULL
AND a.IDDate >= #date
AND a.IDDate < DATEADD(day, 1, #date)
) s
GROUP BY Station
UPDATE
Actually, the above solution could be rewritten without subselects. The subselect is the result of trying some idea, which I've eventually discarded, but the subselect has remained for some unknown reason. Here's an identical solution with no subselects:
DECLARE #date datetime;
SET #date = '20110421';
SELECT
a.Station,
Diff = COUNT(*)
FROM database1.dbo.table1 a
LEFT JOIN database2.dbo.table2 b ON a.NimID = b.XilID AND b.XilDate = #date
WHERE b.XilID IS NULL
AND a.IDDate >= #date
AND a.IDDate < DATEADD(day, 1, #date)
GROUP BY a.Station
Try to avoid converting from datetime to varchar.
WHERE a.Date >= '2011-04-21'
AND a.Date < (CAST('2011-04-21' AS datetime) + 1)
Try the below - note that you appeared to be attempting to join the two tables to perform the 'not in' which would result in a very slow to produce and very wrong resultset.
Also, if IDDate is a DATETIME column then you'd be better of performing a range check e.g. (a.IDDate >= '2011-04-21' AND a.IDDate < '2011-04-22'). Thinking about it - if it's a text column in the format yyyy-MM-dd then a range check would also work - if it's a text column with mixed format dates then forget I mentioned it.
select x.Station as Station,
count(distinct x.NimID) as Difference
from (
select a.NimID,
a.IDDate,
a.Station
from database1.dbo.table1 a
where left(cast(a.IDDate as date),11)='2011-04-21'
) as X
where x.NimID not in (
select b.XilID
from database2.dbo.Table2 b
where b.XilDate='2011-04-21'
)
group by x.Station

Resources