I am updating and inserting the bulk records from web api(in form of data table) in my table using the Merge statement. The insert and update are working fine but when I am sending a corrupt value.Catch block is not handling that value. Here is my code :
//Creating type to fetch the data table from web api(have to go with this approach only)
CREATE TYPE [dbo].[CustomerType] AS TABLE(
[Id] [int] NULL,
[Name] [nvarchar](100) NULL,
[Country] [nvarchar](50) NULL,
[Date] [datetime] NULL
)
// Stored proc for update and insert using merge
CREATE PROCEDURE Update_Customers
#tblCustomers CustomerType READONLY
AS
BEGIN
BEGIN TRY
MERGE INTO Customers c1
USING #tblCustomers c2
ON c1.CustomerId=c2.Id
WHEN MATCHED THEN
UPDATE SET c1.Name = c2.Name
,c1.Country = c2.Country,
c1.date =c2.date
WHEN NOT MATCHED THEN
INSERT VALUES(c2.Id, c2.Name,c2.date, c2.Country);
END TRY
BEGIN CATCH
//My table for logging the error
INSERT INTO ERROR_LOG(ERROR_LINE,ERROR_MESSAGE,PROC_NAME)
VALUES (ERROR_LINE(),ERROR_MESSAGE(),ERROR_PROCEDURE)
END CATCH
END
Thanks in advance
The problem is SQL does not treat errors the way you think it does.
SQL Server maintains an ACID state.
Atomic: All or nothing. All work is broken into transactional statements, which must be successful or the entire modification/creation is reverted, or rollback, to the previous working state. You can be explicit what transaction a work is in by using BEGIN TRANSACTION/COMMIT TRANSACTION or using ROLLBACK TRANSACTION. Read More: Transaction Statements and ROLLBACK TRANSACTION
Consistent: Every transaction must leave SQL Server in a state that is valid. For example, while DROP TABLE MyTable; may be a valid transaction, if MyTable has dependencies (i.e. Foreign Key Constraints), then SQL Server would be left in an inconsistent state and will rollback the transaction to the last consistent state.
Isolated: Every transaction occurs in its own time and space. We say Serialized to be specific. Isolation allows for multiple, even similar statements to be made at one time on the Server and process every one of them exclusively from each other. The term Blocking refers to when a statement is waiting for a transaction to commit and a Dead Block is when both transactions are waiting for each other indefinitely.
Durability: Unlike a software program that lives in memory and can be lost by a sudden loss of power, SQL Server's transactions are permanent to the disk once committed. This provides finality to a statement. This is also where the LOG file comes in place, since it records what transactions are performed on the database.
READ MORE: ACID PROPERTIES
I mention all of this since your BEGIN TRY/CATCH block looks for these issues.
Treat your tables as sets of data
Recall SQL is based on the relational set theory. SQL is best when it can perform actions on groups/sets of data/objects. It is completely incompatible with inheritance, and using cursive logic is possible, but inefficient at best.
So in your ETL process, treat the data as a whole set.
Instead, by treating your data as a whole set, you can transform the minor misspellings/errors of the user/web interface, and isolate actual errors of datatypes that you wish to catalogue by using the predicate clause (WHERE/HAVING/ON)
An example of what you could do is similar to the following:
CREATE TABLE #ETLTable(ID INT NOT NULL
, Name VARCHAR(50) NOT NULL
, Comment VARCHAR(50) NULL
, Country VARCHAR(50) NOT NULL
, Dated VARCHAR(50)); --implicitly declared as allowing NULL values unless the type denies it
CREATE TABLE #MyTable (ID INT NOT NULL
, Name VARCHAR(50) NOT NULL
, Comment VARCHAR(50) NULL
, Country VARCHAR(50) NOT NULL
, Dated DATE);
CREATE TABLE #ERROR_TABLE (ObjectID INT NULL
, encrypted INT
, text VARCHAR(MAX)
, start_time DATETIME2
, Table_ID INT
, Table_Name VARCHAR(50)
, Table_Country VARCHAR(50)
, Table_Dated VARCHAR(20));
CREATE TABLE #ACTIONS ( [Action] VARCHAR(50)
, [inserted_ID] int
, inserted_Name VARCHAR(50)
, inserted_Country VARCHAR(50)
, inserted_Date Date
, deleted_ID int
, deleted_Name VARCHAR(50)
, deleted_Country VARCHAR(50)
, deleted_Date Date)
INSERT INTO #MyTable (ID, Name, Country, Dated)
VALUES (1, 'Mary', 'USA', '12/23/12')
, (2, 'Julio', 'Mexico', '12/25/12')
, (3, 'Marx', 'USA', '11/11/12')
, (4, 'Ann', 'USA', '11/27/12');
INSERT INTO #ETLTable(ID, Name, Country, Comment, Dated)
VALUES (1,'Mary', 'USA', 'Valid Date', '12/23/12')
, (2,'Julio', 'Mexico', 'Invalid Date', '12-25,12')
, (3,'Marx', 'USA', 'Valid but incorrect Date', '12-11/25') --this actually means YY-MM-DD
, (4,'Ann','USA', 'Not Matching Date', '12-23-12')
, (5, 'Hillary', 'USA', 'New Entry', '11/24/12');
/*SQL Server is fairly flexible to datatypes entries, so be explicit.
Note the date highlighted. This will fail your code since it is of datatype DATETIME. CAST is implicit anyways, and should not be depended on in important queries. Theoretically, you could catch this with your TRY/CATCH block, but the entire Merge statement would be rolled back...an expensive and probably unacceptable cost.
You should proof your INSERTIONS of errors before you start expensive transactions like the MERGE statement. In this example, I knew what dates were being entered and that only the Japanese version might be entered. Dates are very finicky (DATE has no format) and best handled outside the merge statement altogether. */
;WITH CTE AS (
SELECT ID, Name, Country, Comment, ISNULL(TRY_CAST(Dated AS Date), CAST(CONVERT(datetime, Dated, 11) AS DATE) ) AS Dated --TRY_CAST returns a NULL if it cannot succeed and will not fail your query.
FROM #ETLTable
WHERE ISDATE(Dated) = 1
)
MERGE INTO #MyTable TGT
USING CTE SRC ON TGT.ID = SRC.ID
AND TGT.Name = SRC.Name
WHEN MATCHED AND SRC.Dated > TGT.Dated
THEN UPDATE SET TGT.Dated = SRC.Dated
, TGT.Comment = SRC.Comment
WHEN NOT MATCHED BY TARGET
THEN INSERT(ID, Name, Country, Comment, Dated) VALUES (SRC.ID, SRC.Name, SRC.Country, SRC.Comment, SRC.DATED)
OUTPUT $action AS [Action]
, inserted.ID
, inserted.Name
, inserted.Country
, inserted.Dated
, deleted.ID
, deleted.Name
, deleted.Country
, deleted.Dated
INTO #Actions;
/* Note, you would have to run this query separately, as it only records active transactions. */
--CREATE PROC MyProject
--AS BEGIN
--WAITFOR DELAY '00:00:10'
--Print 'ME'
--END
;WITH CTE AS (
SELECT t.objectid, encrypted, text, start_time
FROM sys.dm_exec_requests AS r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
INNER JOIN (SELECT object_id FROM sys.procedures
WHERE object_id = OBJECT_ID('MyProject') ) B ON t.objectid = B.object_id )
INSERT INTO #ERROR_TABLE (ObjectID, encrypted, text, start_time, Table_ID, Table_Name, Table_Country, Table_Dated)
SELECT CTE.objectid, CTE.encrypted, CTE.text, CTE.start_time, B.Table_ID, B.Table_Name, B.Table_Country, B.Table_Dated
FROM CTE
RIGHT OUTER JOIN ( SELECT ID AS Table_ID
, Name AS Table_Name
, Country AS Table_Country
, Dated AS Table_Dated
FROM #ETLTable
WHERE ISDATE(Dated) = 0) B ON 1 = 1
SELECT * FROM #ERROR_TABLE
SELECT * FROM #Actions
SELECT * FROM #MyTable
Note that the script does not break anything in the SQL Server and in fact separates the bad inputs BEFORE the Merge statement.
Notice the useful information in the #ERROR_TABLE. You can actually utilize and make recommendations from this information live. This is your goal in error catching.
As a reminder, understand your RAISERROR or TRY/CATCH will only work on entire transactions...a unit of work. If you need to gracefully allow a Merge statement to fail, fine. But understand that proper ETL will guarantee such expensive statements never fail.
So please, no matter what, perform all your ETL processes BEFORE you write to your final tables. This is the point of staging tables...so you can filter out illegal or misspellings in your bulk tables.
To do anything less is to be lazy and unprofessional. Learn the right habits and they will help save you from future disaster.
Related
For example, I have 2 tables, which I need for my query, Property and Move for history of moving properties.
I must create a query which will return all properties + 1 additional boolean column, IsInService, which will have value true, in cases, when Move table has a record for property with DateTo = null and MoveTypeID = 1 ("In service").
I have created this query:
SELECT
[ID], [Name],
(SELECT COUNT(*)
FROM [Move]
WHERE PropertyID = p.ID
AND DateTo IS NULL
AND MoveTypeID = 1) AS IsInService
FROM
[Property] as p
ORDER BY
[Name] ASC
OFFSET 100500 ROWS FETCH NEXT 50 ROWS ONLY;
I'm not so strong in SQL, but as I know, subqueries are the evil :)
How to create high performance SQL query in my case, if it is expected that these tables will include millions of records?
I've updated the code based on your comment. If you need something else, please provide input and output data expected. This is about all I can do based on inference from the existing comments. Further, this isn't intended to give you an exact working solution. My intention was to give you a prototype from which you can build your solution.
That said:
The code below is the basic join that you need. However, keep in mind that indexing is probably going to play as big a part in performance as the structure of the table and the query. It doesn't matter how you query the tables if the indexes aren't there to support the queries once you reach a certain size. There are a lot of resources online for indexing but viewing querying plans should be at the top of your list.
As a note, your column [dbo].[Property] ([Name]) should probably be NVARCHAR to allow SQL to minimize data storage. Indexes on that column will then be smaller and searches/updates faster.
DECLARE #Property AS TABLE
(
[ID] INT
, [Name] NVARCHAR(100)
);
INSERT INTO #Property
([ID]
, [Name])
VALUES (1,N'A'),
(2,N'B'),
(3,N'C');
DECLARE #Move AS TABLE
(
[ID] INT
, [DateTo] DATE
, [MoveTypeID] INT
, [PropertyID] INT
);
INSERT INTO #Move
([ID]
, [DateTo]
, [MoveTypeID]
, [PropertyID])
VALUES (1,NULL,1,1),
(2,NULL,1,2),
(3,N'2017-12-07',1,2);
SELECT [Property].[ID] AS [property_id]
, [Property].[Name] AS [property_name]
, CASE
WHEN [Move].[DateTo] IS NULL
AND [Move].[MoveTypeID] = 1 THEN
N'true'
ELSE
N'false'
END AS [in_service]
FROM #Property AS [Property]
LEFT JOIN #Move AS [Move]
ON [Move].[PropertyID] = [Property].[ID]
WHERE [Move].[DateTo] IS NULL
AND [Move].[MoveTypeID] = 1;
I created a stored procedure to implement rate limiting on my API, this is called about 5-10k times a second and each day I'm noticing dupes in the counter table.
It looks up the API key being passed in and then checks the counter table with the ID and date combination using an "UPSERT" and if it finds a result it does an UPDATE [count]+1 and if not it will INSERT a new row.
There is no primary key in the counter table.
Here is the stored procedure:
USE [omdb]
GO
/****** Object: StoredProcedure [dbo].[CheckKey] Script Date: 6/17/2017 10:39:37 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[CheckKey] (
#apikey AS VARCHAR(10)
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #userID as int
DECLARE #limit as int
DECLARE #curCount as int
DECLARE #curDate as Date = GETDATE()
SELECT #userID = id, #limit = limit FROM [users] WHERE apiKey = #apikey
IF #userID IS NULL
BEGIN
--Key not found
SELECT 'False' as [Response], 'Invalid API key!' as [Reason]
END
ELSE
BEGIN
--Key found
BEGIN TRANSACTION Upsert
MERGE [counter] AS t
USING (SELECT #userID AS ID) AS s
ON t.[ID] = s.[ID] AND t.[date] = #curDate
WHEN MATCHED THEN UPDATE SET t.[count] = t.[count]+1
WHEN NOT MATCHED THEN INSERT ([ID], [date], [count]) VALUES (#userID, #curDate, 1);
COMMIT TRANSACTION Upsert
SELECT #curCount = [count] FROM [counter] WHERE ID = #userID AND [date] = #curDate
IF #limit IS NOT NULL AND #curCount > #limit
BEGIN
SELECT 'False' as [Response], 'Request limit reached!' as [Reason]
END
ELSE
BEGIN
SELECT 'True' as [Response], NULL as [Reason]
END
END
END
I also think some locks are happening after introducing this SP.
The dupes aren't breaking anything, but I'm curious if it's something fundamentally wrong with my code or if I should setup a constraint in the table to prevent this. Thanks
Update 6/23/17: I dropped the MERGE statement and tried using ##ROWCOUNT but it also caused dupes
BEGIN TRANSACTION Upsert
UPDATE [counter] SET [count] = [count]+1 WHERE [ID] = #userID AND [date] = #curDate
IF ##ROWCOUNT = 0 AND ##ERROR = 0
INSERT INTO [counter] ([ID], [date], [count]) VALUES (#userID, #curDate, 1)
COMMIT TRANSACTION Upsert
A HOLDLOCK hint on the update statement will avoid the race condition. To prevent deadlocks, I suggest a clustered composite primary key (or unique index) on ID and date.
The example below incorporates these changes and uses the SET <variable> = <column> = <expression> form of the SET clause to avoid the need for the subsequent SELECT of the final counter value and thereby improve performance.
ALTER PROCEDURE [dbo].[CheckKey]
#apikey AS VARCHAR(10)
AS
SET NOCOUNT ON;
--SET XACT_ABORT ON is a best practice for procs with explcit transactions
SET XACT_ABORT ON;
DECLARE
#userID as int
, #limit as int
, #curCount as int
, #curDate as Date = GETDATE();
BEGIN TRY;
SELECT
#userID = id
, #limit = limit
FROM [users]
WHERE apiKey = #apikey;
IF #userID IS NULL
BEGIN
--Key not found
SELECT 'False' as [Response], 'Invalid API key!' as [Reason];
END
ELSE
BEGIN
--Key found
BEGIN TRANSACTION Upsert;
UPDATE [counter] WITH(HOLDLOCK)
SET #curCount = [count] = [count] + 1
WHERE
[ID] = #userID
AND [date] = #curDate;
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO [counter] ([ID], [date], [count])
VALUES (#userID, #curDate, 1);
END;
IF #limit IS NOT NULL AND #curCount > #limit
BEGIN
SELECT 'False' as [Response], 'Request limit reached!' as [Reason]
END
ELSE
BEGIN
SELECT 'True' as [Response], NULL as [Reason]
END;
COMMIT TRANSACTION Upsert;
END;
END TRY
BEGIN CATCH
IF ##TRANCOUNT > 0 ROLLBACK;
THROW;
END CATCH;
GO
Probably not the answer you're looking for but for a rate-limiting counter I would use a cache like Redis in a middleware before hitting the API. Performance-wise it's pretty great since Redis would have no problem with the load and your DB won’t be impacted.
And if you want to keep a history of hits per api key per day in SQL, run a daily task to import yesterday's counts from Redis to SQL.
The data-set would be small enough to get a Redis instance that would cost literally nothing (or close).
It will be the merge statement getting into a race condition with itself, i.e. your API is getting called by the same client and both times the merge statement finds no row so inserts one. Merge isn't an atomic operation, even though it's reasonable to assume it is. For example see this bug report for SQL 2008, about merge causing deadlocks, the SQL server team said this is by design.
From your post I think the immediate issue is that your clients will be potentially getting​ a small number of free hits on your API. For example if two requests come in and see no row you'll start with two rows with a count of 1 when you'd actually want one row with a count of 2 and the client could end up getting 1 free API hit that day. If three requests crossed over you'd get three rows with a count of 1 and they could get 2 free API hits, etc.
Edit
So as your link suggests you've got two categories of options you could explore, firstly just try and get this working in SQL server, secondly other architectural solutions.
For the SQL option I would do away with the merge and consider pre-populating your clients ahead of time, either nightly or less often for several days at a time, this will leave you a single update instead of the merge/update and insert. Then you can confirm both your update and your select are fully optimised, ie have the necessary index and that they aren't causing scans. Next you could look at tweaking locking so you're only locking at the row level, see this for some more info. For the select you could also look at using NOLOCK which means you could get slightly incorrect data but this shouldn't matter in your case, you'll be using a WHERE which targets a single row always as well.
For the non-SQL options, as your link says you could look at queuing things up, obviously these would be the updates/inserts so your selects would be seeing old data. This may or may not be acceptable depending on how far apart they are although you could have this as an "eventually consistent" solution if you wanted to be strict and charge extra or take off API hits the next day or something. You could also look at caching options to store the counts, this would get more complex if your app is distributed but there are caching solutions for this. If you went with caching you could choose to not persist anything but then you'd potentially give away a load of free hits if your site went down, but you'd probably have bigger issues to worry about then anyway!
At a high level, have you considered pursuing the following scenario?
Restructuring: Set the primary key on on your table to be a composite of (ID, date). Possibly even better, just use the API Key itself instead of the arbitrary ID you're assigning it.
Query A: Do SQL Server's equivalent of "INSERT IGNORE" (it seems there are semantic equivalents for SQL Server based on a Google search) with the values (ID, TODAY(), 1). You'll also want to specify a WHERE clause that checks the ID actually exists in your API/limits table).
Query B: Update the row with (ID, TODAY()) as its primary key, setting count := count + 1, and in the very same query, do an inner join with your limits table, so that in the where clause you can specify that you'll only update the count if the count < limit.
If the majority of your requests are valid API requests or rate-limited requests, I would perform queries in the following order on every request:
Run Query B.
If 0 rows updated:
Run query A.
If 0 rows updated:
Run query B.
If 0 rows updated, reject because of rate limit.
If 1 rows updated, continue.
If 1 rows updated:
continue.
If 1 row updated:
continue.
If the majority of your requests are invalid API requests, I'd do the following:
Run query A.
If 0 rows updated:
Run query B.
If 0 rows updated, reject because of rate limit.
If 1 rows updated, continue.
If 1 rows updated:
continue.
Using SSRS with SQL Server 2008 R2 (Visual Studio environment).
I am trying to produce a stepped down report based on a level/value in a table on sql server. The level act as a indent position with sort_value been the recursive parent in the report.
Sample of table in SQL Server:
Sample of output required
OK, I've come up with a solution but please note the following before you proceed.
1. The process relies on the data being in the correct order, as per your sample data.
2. If this is your real data structure, I strongly recommend you review it.
OK, So the first things I did was recreate your table exactly as per example. I called the table Stepped as I couldn't think of anything else!
The following code can then be used as your dataset in SSRS but you can obviously just run the T-SQL directly to see the output.
-- Create a copy of the data with a row number. This means the input data MUST be in the correct order.
DECLARE #t TABLE(RowN int IDENTITY(1,1), Sort_Order int, [Level] int, Qty int, Currency varchar(20), Product varchar(20))
INSERT INTO #t (Sort_Order, [Level], Qty, Currency, Product)
SELECT * FROM Stepped
-- Update the table so each row where the sort_order is NULL will take the sort order from the row above
UPDATE a SET Sort_Order = b.Sort_Order
FROM #t a
JOIN #t b on a.RowN = b.rowN+1
WHERE a.Sort_Order is null and b.Sort_Order is not null
-- repeat this until we're done.
WHILE ##ROWCOUNT >0
BEGIN
UPDATE a SET Sort_Order = b.Sort_Order
FROM #t a
JOIN #t b on a.RowN = b.rowN+1
WHERE a.Sort_Order is null and b.Sort_Order is not null
END
-- Now we can select from our new table sorted by both sort oder and level.
-- We also separate out the products based on their level.
SELECT
CASE Level WHEN 1 THEN Product ELSE NULL END as ProdLvl_1
, CASE Level WHEN 2 THEN Product ELSE NULL END as ProdLvl_2
, CASE Level WHEN 3 THEN Product ELSE NULL END as ProdLvl_3
, QTY
, Currency
FROM #t s
ORDER BY Sort_Order, Level
The output looks like this...
You may also want to consider swapping out the final statement for this.
-- Alternatively use this style and use a single column in the report.
-- This is better if the number of levels can change.
SELECT
REPLICATE('--', Level-1) + Product as Product
, QTY
, Currency
FROM #t s
ORDER BY Sort_Order, Level
As this will give you a single column for 'product' indented like this.
I've got a simple queue implementation in MS Sql Server 2008 R2. Here's the essense of the queue:
CREATE TABLE ToBeProcessed
(
Id BIGINT IDENTITY(1,1) PRIMARY KEY NOT NULL,
[Priority] INT DEFAULT(100) NOT NULL,
IsBeingProcessed BIT default (0) NOT NULL,
SomeData nvarchar(MAX) NOT null
)
I want to atomically select the top n rows ordered by the priority and the id where IsBeingProcessed is false and update those rows to say they are being processed. I thought I'd use a combination of Update, Top, Output and Order By but unfortunately you can't use top and order by in an Update statement.
So I've made an in clause to restrict the update and that sub query does the order by (see below). My question is, is this whole statement atomic, or do I need to wrap it in a transaction?
DECLARE #numberToProcess INT = 2
CREATE TABLE #IdsToProcess
(
Id BIGINT NOT null
)
UPDATE
ToBeProcessed
SET
ToBeProcessed.IsBeingProcessed = 1
OUTPUT
INSERTED.Id
INTO
#IdsToProcess
WHERE
ToBeProcessed.Id IN
(
SELECT TOP(#numberToProcess)
ToBeProcessed.Id
FROM
ToBeProcessed
WHERE
ToBeProcessed.IsBeingProcessed = 0
ORDER BY
ToBeProcessed.Id,
ToBeProcessed.Priority DESC)
SELECT
*
FROM
#IdsToProcess
DROP TABLE #IdsToProcess
Here's some sql to insert some dummy rows:
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
If I understand the motivation for the question you want to avoid the possibility that two concurrent transactions could both execute the sub query to get the top N rows to process then proceed to update the same rows?
In that case I'd use this approach.
;WITH cte As
(
SELECT TOP(#numberToProcess)
*
FROM
ToBeProcessed WITH(UPDLOCK,ROWLOCK,READPAST)
WHERE
ToBeProcessed.IsBeingProcessed = 0
ORDER BY
ToBeProcessed.Id,
ToBeProcessed.Priority DESC
)
UPDATE
cte
SET
IsBeingProcessed = 1
OUTPUT
INSERTED.Id
INTO
#IdsToProcess
I was a bit uncertain earlier whether SQL Server would take U locks when processing your version with the sub query thus blocking two concurrent transactions from reading the same TOP N rows. This does not appear to be the case.
Test Table
CREATE TABLE JobsToProcess
(
priority INT IDENTITY(1,1),
isprocessed BIT ,
number INT
)
INSERT INTO JobsToProcess
SELECT TOP (1000000) 0,0
FROM master..spt_values v1, master..spt_values v2
Test Script (Run in 2 concurrent SSMS sessions)
BEGIN TRY
DECLARE #FinishedMessage VARBINARY (128) = CAST('TestFinished' AS VARBINARY (128))
DECLARE #SynchMessage VARBINARY (128) = CAST('TestSynchronising' AS VARBINARY (128))
SET CONTEXT_INFO #SynchMessage
DECLARE #OtherSpid int
WHILE(#OtherSpid IS NULL)
SELECT #OtherSpid=spid
FROM sys.sysprocesses
WHERE context_info=#SynchMessage and spid<>##SPID
SELECT #OtherSpid
DECLARE #increment INT = ##spid
DECLARE #number INT = #increment
WHILE (#number = #increment AND NOT EXISTS(SELECT * FROM sys.sysprocesses WHERE context_info=#FinishedMessage))
UPDATE JobsToProcess
SET #number=number +=#increment,isprocessed=1
WHERE priority = (SELECT TOP 1 priority
FROM JobsToProcess
WHERE isprocessed=0
ORDER BY priority DESC)
SELECT *
FROM JobsToProcess
WHERE number not in (0,#OtherSpid,##spid)
SET CONTEXT_INFO #FinishedMessage
END TRY
BEGIN CATCH
SET CONTEXT_INFO #FinishedMessage
SELECT ERROR_MESSAGE(), ERROR_NUMBER()
END CATCH
Almost immediately execution stops as both concurrent transactions update the same row so the S locks taken whilst identifying the TOP 1 priority must get released before it aquires a U lock then the 2 transactions proceed to get the row U and X lock in sequence.
If a CI is added ALTER TABLE JobsToProcess ADD PRIMARY KEY CLUSTERED (priority) then deadlock occurs almost immediately instead as in this case the row S lock doesn't get released, one transaction aquires a U lock on the row and waits to convert it to an X lock and the other transaction is still waiting to convert its S lock to a U lock.
If the query above is changed to use MIN rather than TOP
WHERE priority = (SELECT MIN(priority)
FROM JobsToProcess
WHERE isprocessed=0
)
Then SQL Server manages to completely eliminate the sub query from the plan and takes U locks all the way.
Every individual T-SQL statement is, according to all my experience and all the documenation I've ever read, supposed to be atomic. What you have there is a single T-SQL statement, ergo is should be atomic and will not require explicit transaction statements. I've used this precise kind of logic many times, and never had a problem with it. I look forward to seeing if anyone as a supportable alternate opinion.
Incidentally, look into the ranking functions, specifically row_number(), for retrieving a set number of items. The syntax is perhaps a tad awkward, but overall they are flexible and powerful tools. (There are about a bazillion Stack Overlow questions and answers that discuss them.)
Writing my first SQL query to run specifically as a SQL Job and I'm a little out of my depth. I have a table within a SQL Server 2005 Database which is populated each day with data from various buildings. To monitor the system better, I am attempting to write a SQL Job that will run a query (or stored procedure) to verify the following:
- At least one row of data appears each day per building
My question has two main parts;
How can I verify that data exists for each building? While there is a "building" column, I'm not sure how to verify each one. I need the query/sp to fail unless all locations have reported it. Do I need to create a control table for the query/sp to compare against? (as the number of building reporting in can change)
How do I make this query fail so that the SQL Job fails? Do I need to wrap it in some sort of error handling code?
Table:
Employee RawDate Building
Bob 2010-07-22 06:04:00.000 2
Sally 2010-07-22 01:00:00.000 9
Jane 2010-07-22 06:04:00.000 12
Alex 2010-07-22 05:54:00.000 EA
Vince 2010-07-22 07:59:00.000 30
Note that the building column has at least one non-numeric value. The range of buildings that report in changes over time, so I would prefer to avoid hard-coding of building values (a table that I can update would be fine).
Should I use a cursor or dynamic SQL to run a looping SELECT statement that checks for each building based on a control table listing each currently active building?
Any help would be appreciated.
Edit: spelling
You could create a stored procedure that checks for missing entries. The procedure could call raiserror to make the job fail. For example:
if OBJECT_ID('CheckBuildingEntries') is null
exec ('create procedure CheckBuildingEntries as select 1')
go
alter procedure CheckBuildingEntries(
#check_date datetime)
as
declare #missing_buildings int
select #missing_buildings = COUNT(*)
from Buildings as b
left join
YourTable as yt
on yt.Building = b.name
and dateadd(dd,0, datediff(dd,0,yt.RawDate)) =
dateadd(dd,0, datediff(dd,0,#check_date))
where yt.Building is null
if #missing_buildings > 0
begin
raiserror('OMG!', 16, 0)
end
go
An example scheduled job running at 4AM to check yesterday's entries:
declare #yesterday datetime
set #yesterday = dateadd(day, -1, GETDATE())
exec CheckBuildingEntries #yesterday
If an entry was missing, the job would fail. You could set it up to send you an email.
Test tables:
create table Buildings (id int identity, name varchar(50))
create table YourTable (Employee varchar(50), RawDate datetime,
Building varchar(50))
insert into Buildings (name)
select '2'
union all select '9'
union all select '12'
union all select 'EA'
union all select '30'
insert into YourTable (Employee, RawDate, Building)
select 'Bob', '2010-07-22 06:04:00.000', '2'
union all select 'Sally', '2010-07-22 01:00:00.000', '9'
union all select 'Jane', '2010-07-22 06:04:00.000', '12'
union all select 'Alex', '2010-07-22 05:54:00.000', 'EA'
union all select 'Vince', '2010-07-22 07:59:00.000', '30'
Recommendations:
Do use a control table for the buildings - you may find that one
already exists, if you use the Object
Explorer in SQL Server Management
Studio
Don't use a cursor or dynamic SQL to run a loop - use set based
commands instead, possibly something
like the following:
SELECT BCT.Building, COUNT(YDT.Building) Build
FROM dbo.BuildingControlTable BCT
LEFT JOIN dbo.YourDataTable YDT
ON BCT.Building = YDT.Building AND
CAST(FLOOR( CAST( GETDATE() AS FLOAT ) - 1 ) AS DATETIME ) =
CAST(FLOOR( CAST( YDT.RawDate AS FLOAT ) ) AS DATETIME )
GROUP BY BCT.Building