Downsides to MERGE with dummy USING table? - sql-server

I'm creating some stored procedures specifically for use from C# to update various tables in our database. A large number of items require a predictable function that will:
1) Check if a matching row already exists
2) If it doesn't exist, insert data
3) Gather ID of row and return to user
I know this can be done in a number of ways, but the most elegant way I can imagine seems to be using a MERGE with a dummy table and using the procedure params for the ON clause, such as:
CREATE PROCEDURE dbo.UpdatePerson(#PersonID INT, #FirstName VARCHAR(50)) AS
MERGE dbo.Person p
USING (SELECT 1 One) One
ON p.Person_ID = #PersonID
WHEN MATCHED THEN
UPDATE SET First_Name = #FirstName
WHEN NOT MATCHED THEN
INSERT (Person_ID, First_Name) VALUES (#PersonID, #FirstName);
This wraps it all together in one nice bundle, even though I'm not working with an actual table to merge in. I know the same basic idea could be accomplished with:
...
USING (SELECT #PersonID Person_ID, #FirstName First_Name) NewPerson
ON p.Person_ID = NewPerson.Person_ID
...
and maybe this would offer some kind of performance benefit?
Can anyone offer any solid reasons for/against this kind of usage of MERGE?

Instead of using MERGE you can use if condition.
You are having a temp table
CREATE TABLE #Table(PersonID INT,First_Name VARCHAR(100))
-- BEFORE THAT INSERT INTO TEMP TABLE
IF EXISTS(SELECT 1 FROM YOURTABLE WHERE PERSONID IN(SELECT PERSONID FROM #TABLE))
BEGIN
-------YOUR UPDATE QUERY
END
ELSE
BEGIN
-------INSERT QUERY
END
DROP TABLE #Table

Related

Use output of procedure in a new procedure SQL Server [duplicate]

I'm not sure if this is something I should do in T-SQL or not, and I'm pretty sure using the word 'iterate' was wrong in this context, since you should never iterate anything in sql. It should be a set based operation, correct? Anyway, here's the scenario:
I have a stored proc that returns many uniqueidentifiers (single column results). These ids are the primary keys of records in a another table. I need to set a flag on all the corresponding records in that table.
How do I do this without the use of cursors? Should be an easy one for you sql gurus!
This may not be the most efficient, but I would create a temp table to hold the results of the stored proc and then use that in a join against the target table. For example:
CREATE TABLE #t (uniqueid int)
INSERT INTO #t EXEC p_YourStoredProc
UPDATE TargetTable
SET a.FlagColumn = 1
FROM TargetTable a JOIN #t b
ON a.uniqueid = b.uniqueid
DROP TABLE #t
You could also change your stored proc to a user-defined function that returns a table with your uniqueidentifiers. You can joing directly to the UDF and treat it like a table which avoids having to create the extra temp table explicitly. Also, you can pass parameters into the function as you're calling it, making this a very flexible solution.
CREATE FUNCTION dbo.udfGetUniqueIDs
()
RETURNS TABLE
AS
RETURN
(
SELECT uniqueid FROM dbo.SomeWhere
)
GO
UPDATE dbo.TargetTable
SET a.FlagColumn = 1
FROM dbo.TargetTable a INNER JOIN dbo.udfGetUniqueIDs() b
ON a.uniqueid = b.uniqueid
Edit:
This will work on SQL Server 2000 and up...
Insert the results of the stored proc into a temporary table and join this to the table you want to update:
INSERT INTO #WorkTable
EXEC usp_WorkResults
UPDATE DataTable
SET Flag = Whatever
FROM DataTable
INNER JOIN #WorkTable
ON DataTable.Ket = #WorkTable.Key
If you upgrade to SQL 2008 then you can pass table parameters I believe. Otherwise, you're stuck with a global temporary table or creating a permanent table that includes a column for some sort of process ID to identify which call to the stored procedure is relevant.
How much room do you have in changing the stored procedure that generates the IDs? You could add code in there to handle it or have a parameter that lets you optionally flag the rows when it is called.
Use temporary tables or a table variable (you are using SS2005).
Although, that's not nest-able - if a stored proc uses that method then you can't dumpt that output into a temp table.
An ugly solution would be to have your procedure return the "next" id each time it is called by using the other table (or some flag on the existing table) to filter out the rows that it has already returned
You can use a temp table or table variable with an additional column:
DECLARE #MyTable TABLE (
Column1 uniqueidentifer,
...,
Checked bit
)
INSERT INTO #MyTable
SELECT [...], 0 FROM MyTable WHERE [...]
DECLARE #Continue bit
SET #Continue = 1
WHILE (#Continue)
BEGIN
SELECT #var1 = Column1,
#var2 = Column2,
...
FROM #MyTable
WHERE Checked = 1
IF #var1 IS NULL
SET #Continue = 0
ELSE
BEGIN
...
UPDATE #MyTable SET Checked = 1 WHERE Column1 = #var1
END
END
Edit: Actually, in your situation a join will be better; the code above is a cursorless iteration, which is overkill for your situation.

Is there a way to reserve generated ID on a temporary table

I posted this question
INSERT Statement Expensive Queries on Activity Monitor
As you will see the XML structure has different levels.
I have created different tables
Organisation = organisation_id (PRIMARY_KEY)
Contacts = organisation_id (FOREIGN_KEY)
Roles = organisation_id (FOREIGN_KEY)
Rels = organisation_id (FOREIGN_KEY)
Succs = organisation_id (FOREIGN_KEY)
What I want is to generate the organisation_id and do the insert on each table in cascading manner. At the moment the process takes almost 2 hours for 300k. I have 3 approach
Convert XML to List Object and Send by batch(1000) as JSON text and send to a stored procedure the uses OPENJSON
Convert XML to list object and send by batch (1000) and save the batch as JSON a file that SQL Server can read and pass the filepath on a stored procedure which then opens the JSON file using OPENROWSET and OPENJSON
Send the path to XML to a stored procedure then use OPENROWSET and OPENXML.
All process (1-3) inserts the data into a FLAT temp table then iterate each row to call different INSERT stored procedure for each tables. Approach #3 seems to fail with errors on 300k but works on 4 records.
The other question is, will it be much faster if I use an physical table than a temp table?
-------UPDATE-------
As explained on the link, I was doing while loop. Someone suggested / commented to do a batch insert on each of the table. The problem is, for example, Contacts I can only do this if I know the organisation_id
select
organisation_id = IDENTITY( bigint ) -- IF I CAN GENERATE THE ACTUAL ORGANISATION ID
,name = Col.value('.','nvarchar(20)')
,contact_type = c.value('(./#type)[1]','nvarchar(50)')
,contact_value= c.value('(./#value)[1]','nvarchar(50)')
into
#temporganisations
from
#xml.nodes('ns1:OrgRefData/Organisations/Organisation') as Orgs(Col)
outer apply Orgs.Col.nodes('Contacts/Contact') as Cs(c)
Then when I do the batch insert
insert into contacts
(
organisation_id,type,value
)
select
torg.organisation_id -- if this is the actual id then perfect
,torg.type
,torg.value
from #temporg torg
I would suggest that you shred the XML client-side, and switch over to doing some kind of Bulk Copy, this will generally perform much better.
At the moment, you cannot do a normal bcp or SqlBulkCopy, because you also need the foreign key. You need a way to uniquely identify Organisation within the batch, and you say that is difficult owing to the number of columns needed for that.
Instead, you need to generate some kind of unique ID client-side, an incrementing integer will do. You then assign this ID to the child objects as you are shredding the XML into Datatables / IEnumerables / CSV files.
You have two options:
The easiest in many respects, is to not use IDENTITY from OrganisationId and just directly insert your generated ID. This means you can leverage standard SqlBulkCopy procedures.
The downside is that you lose the benefit of automatic IDENTITY assignment, but you could instead just use the SqlBulkCopyOptions.KeepIdentity option which only applies to this insert, and carry on with IDENTITY for other inserts. You would need to estimate a correct batch of IDs that won't clash.
A variation on this is to use GUIDs, these are always unique. I don't really recommend this option.
If you don't want to do this, then it gets quite a bit more complex.
You need to define equivalent Table Types for each of the tables. Each has a column for the temporary primary key of the Organisation
CREATE TYPE OrganisationType AS TABLE
(TempOrganisationID int PRIMARY KEY,
SomeData varchar...
Pass through the shredded XML as Table-Valued Parameters. You would have #Organisations, #Contacts etc.
Then you would have SQL along the following lines:
-- This stores the real IDs
DECLARE #OrganisationIDs TABLE
(TempOrganisationID int PRIMARY KEY, OrganisationId int NOT NULL);
-- We need a hack to get OUTPUT to work with non-inserted columns, so we use a weird MERGE
MERGE INTO Organisation t
USING #Organisations s
ON 1 = 0 -- never match
WHEN NOT MATCHED THEN
INSERT (SomeData, ...)
VALUES (s.SomeData, ...)
OUTPUT
s.TempOrganisationID, inserted.OrganisationID
INTO #OrganisationIDs
(TempOrganisationID, OrganisationID);
-- We now have each TempOrganisationID matched up with a real OrganisationID
-- Now we can insert the child tables
INSERT Contact
(OrganisationID, [Type], [Value]...)
SELECT o.OrganisationID, c.[Type], c.[Value]
FROM #Contact c
JOIN #OrganisationIDs o ON o.TempOrganisationID = c.TempOrganisationID;
-- and so on for all the child tables
Instead of saving the IDs to a table variable, you could instead stream back the OUTPUT to client, and have the client join the IDs to the child tables, then BulkCopy them back again as part of the child tables.
This makes the SQL simpler, however you still need the MERGE, and you risk complicating the client code significantly.
You can try to use the following conceptual example.
SQL
-- DDL and sample data population, start
USE tempdb;
GO
DROP TABLE IF EXISTS #city;
DROP TABLE IF EXISTS #state;
-- parent table
CREATE TABLE #state (
stateID INT IDENTITY PRIMARY KEY,
stateName VARCHAR(30),
abbr CHAR(2),
capital VARCHAR(30)
);
-- child table (1-to-many)
CREATE TABLE #city (
cityID INT IDENTITY,
stateID INT NOT NULL FOREIGN KEY REFERENCES #state(stateID),
city VARCHAR(30),
[population] INT,
PRIMARY KEY (cityID, stateID, city)
);
-- mapping table to preserve IDENTITY ids
DECLARE #idmapping TABLE (GeneratedID INT PRIMARY KEY,
NaturalID VARCHAR(20) NOT NULL UNIQUE);
DECLARE #xml XML =
N'<root>
<state>
<StateName>Florida</StateName>
<Abbr>FL</Abbr>
<Capital>Tallahassee</Capital>
<cities>
<city>
<city>Miami</city>
<population>470194</population>
</city>
<city>
<city>Orlando</city>
<population>285713</population>
</city>
</cities>
</state>
<state>
<StateName>Texas</StateName>
<Abbr>TX</Abbr>
<Capital>Austin</Capital>
<cities>
<city>
<city>Houston</city>
<population>2100263</population>
</city>
<city>
<city>Dallas</city>
<population>5560892</population>
</city>
</cities>
</state>
</root>';
-- DDL and sample data population, end
;WITH rs AS
(
SELECT stateName = p.value('(StateName/text())[1]', 'VARCHAR(30)'),
abbr = p.value('(Abbr/text())[1]', 'CHAR(2)'),
capital = p.value('(Capital/text())[1]', 'VARCHAR(30)')
FROM #xml.nodes('/root/state') AS t(p)
)
MERGE #state AS o
USING rs ON 1 = 0
WHEN NOT MATCHED THEN
INSERT(stateName, abbr, capital)
VALUES(rs.stateName, rs.Abbr, rs.Capital)
OUTPUT inserted.stateID, rs.stateName
INTO #idmapping (GeneratedID, NaturalID);
;WITH Details AS
(
SELECT NaturalID = p.value('(StateName/text())[1]', 'VARCHAR(30)'),
city = c.value('(city/text())[1]', 'VARCHAR(30)'),
[population] = c.value('(population/text())[1]', 'INT')
FROM #xml.nodes('/root/state') AS A(p) -- parent
CROSS APPLY A.p.nodes('cities/city') AS B(c) -- child
)
INSERT #city (stateID, city, [Population])
SELECT m.GeneratedID, d.city, d.[Population]
FROM Details AS d
INNER JOIN #idmapping AS m ON d.NaturalID = m.NaturalID;
-- test
SELECT * FROM #state;
SELECT * FROM #idmapping;
SELECT * FROM #city;

Splitting multiple fields by delimiter

I have to write an SP that can perform Partial Updates on our databases, the changes are stored in a record of the PU table. A values fields contains all values, delimited by a fixed delimiter. A tables field refers to a Schemes table containing the column names for each table in a similar fashion in a Colums fiels.
Now for my SP I need to split the Values field and Columns field in a temp table with Column/Value pairs, this happens for each record in the PU table.
An example:
Our PU table looks something like this:
CREATE TABLE [dbo].[PU](
[Table] [nvarchar](50) NOT NULL,
[Values] [nvarchar](max) NOT NULL
)
Insert SQL for this example:
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','John Doe;26');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Jane Doe;22');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Mike Johnson;20');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Mary Jane;24');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','Mathematics');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','English');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','Geography');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus A;Schools Road 1;Educationville');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus B;Schools Road 31;Educationville');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus C;Schools Road 22;Educationville');
And we have a Schemes table similar to this:
CREATE TABLE [dbo].[Schemes](
[Table] [nvarchar](50) NOT NULL,
[Columns] [nvarchar](max) NOT NULL
)
Insert SQL for this example:
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Person','[Name];[Age]');
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Course','[Name]');
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Campus','[Name];[Address];[City]');
As a result the first record of the PU table should result in a temp table like:
The 5th will have:
Finally, the 8th PU record should result in:
You get the idea.
I tried use the following query to create the temp tables, but alas it fails when there's more that one value in the PU record:
DECLARE #Fields TABLE
(
[Column] INT,
[Value] VARCHAR(MAX)
)
INSERT INTO #Fields
SELECT TOP 1
(SELECT Value FROM STRING_SPLIT([dbo].[Schemes].[Columns], ';')),
(SELECT Value FROM STRING_SPLIT([dbo].[PU].[Values], ';'))
FROM [dbo].[PU] INNER JOIN [dbo].[Schemes] ON [dbo].[PU].[Table] = [dbo].[Schemes].[Table]
TOP 1 correctly gets the first PU record as each PU record is removed once processed.
The error is:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
In the case of a Person record, the splits are indeed returning 2 values/colums at a time, I just want to store the values in 2 records instead of getting an error.
Any help on rewriting the above query?
Also do note that the data is just generic nonsense. Being able to have 2 fields that both have delimited values, always equal in amount (e.g. a 'person' in the PU table will always have 2 delimited values in the field), and break them up in several column/header rows is the point of the question.
UPDATE: Working implementation
Based on the (accepted) answer of Sean Lange, I was able to work out followin implementation to overcome the issue:
As I need to reuse it, the combine column/value functionality is performed by a new function, declared as such:
CREATE FUNCTION [dbo].[JoinDelimitedColumnValue]
(#splitValues VARCHAR(8000), #splitColumns VARCHAR(8000),#pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH MyValues AS
(
SELECT ColumnPosition = x.ItemNumber,
ColumnValue = x.Item
FROM dbo.DelimitedSplit8K(#splitValues, #pDelimiter) x
)
, ColumnData AS
(
SELECT ColumnPosition = x.ItemNumber,
ColumnName = x.Item
FROM dbo.DelimitedSplit8K(#splitColumns, #pDelimiter) x
)
SELECT cd.ColumnName,
v.ColumnValue
FROM MyValues v
JOIN ColumnData cd ON cd.ColumnPosition = v.ColumnPosition
;
In case of the above sample data, I'd call this function with the following SQL:
DECLARE #FieldValues VARCHAR(8000), #FieldColumns VARCHAR(8000)
SELECT TOP 1 #FieldValues=[dbo].[PU].[Values], #FieldColumns=[dbo].[Schemes].[Columns] FROM [dbo].[PU] INNER JOIN [dbo].[Schemes] ON [dbo].[PU].[Table] = [dbo].[Schemes].[Table]
INSERT INTO #Fields
SELECT [Column] = x.[ColumnName],[Value] = x.[ColumnValue] FROM [dbo].[JoinDelimitedColumnValue](#FieldValues, #FieldColumns, #Delimiter) x
This data structure makes this way more complicated than it should be. You can leverage the splitter from Jeff Moden here. http://www.sqlservercentral.com/articles/Tally+Table/72993/ The main difference of that splitter and all the others is that his returns the ordinal position of each element. Why all the other splitters don't do this is beyond me. For things like this it is needed. You have two sets of delimited data and you must ensure that they are both reassembled in the correct order.
The biggest issue I see is that you don't have anything in your main table to function as an anchor for ordering the results correctly. You need something, even an identity to ensure the output rows stay "together". To accomplish I just added an identity to the PU table.
alter table PU add RowOrder int identity not null
Now that we have an anchor this is still a little cumbersome for what should be a simple query but it is achievable.
Something like this will now work.
with MyValues as
(
select p.[Table]
, ColumnPosition = x.ItemNumber
, ColumnValue = x.Item
, RowOrder
from PU p
cross apply dbo.DelimitedSplit8K(p.[Values], ';') x
)
, ColumnData as
(
select ColumnName = replace(replace(x.Item, ']', ''), '[', '')
, ColumnPosition = x.ItemNumber
, s.[Table]
from Schemes s
cross apply dbo.DelimitedSplit8K(s.Columns, ';') x
)
select cd.[Table]
, v.ColumnValue
, cd.ColumnName
from MyValues v
join ColumnData cd on cd.[Table] = v.[Table]
and cd.ColumnPosition = v.ColumnPosition
order by v.RowOrder
, v.ColumnPosition
I recommended not storing values like this in the first place. I recommend having a key value in the tables and preferably not using Table and Columns as a composite key. I recommend to avoid using reserved words. I also don't know what version of SQL you are using. I am going to assume you are using a fairly recent version of Microsoft SQL Server that will support my provided stored procedure.
Here is an overview of the solution:
1) You need to convert both the PU and the Schema table into a table where you will have each "column" value in the list of columns isolated in their own row. If you can store the data in this format rather than the provided format, you will be a little better off.
What I mean is
Table|Columns
Person|Jane Doe;22
needs converted to
Table|Column|OrderInList
Person|Jane Doe|1
Person|22|2
There are multiple ways to do this, but I prefer an xml trick that I picked up. You can find multiple split string examples online so I will not focus on that. Use whatever gives you the best performance. Unfortunately, You might not be able to get away from this table-valued function.
Update:
Thanks to Shnugo's performance enhancement comment, I have updated my xml splitter to give you the row number which reduces some of my code. I do the exact same thing to the Schema list.
2) Since the new Schema table and the new PU table now have the order each column appears, the PU table and the schema table can be joined on the "Table" and the OrderInList
CREATE FUNCTION [dbo].[fnSplitStrings_XML]
(
#List NVARCHAR(MAX),
#Delimiter VARCHAR(255)
)
RETURNS TABLE
AS
RETURN
(
SELECT y.i.value('(./text())[1]', 'nvarchar(4000)') AS Item,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as RowNumber
FROM
(
SELECT CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.') AS x
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
CREATE Procedure uspGetColumnValues
as
Begin
--Split each value in PU
select p.[Table],p.[Values],a.[Item],CHARINDEX(a.Item,p.[Values]) as LocationInStringForSorting,a.RowNumber
into #PuWithOrder
from PU p
cross apply [fnSplitStrings_XML](p.[Values],';') a --use whatever string split function is working best for you (performance wise)
--Split each value in Schema
select s.[Table],s.[Columns],a.[Item],CHARINDEX(a.Item,s.[Columns]) as LocationInStringForSorting,a.RowNumber
into #SchemaWithOrder
from Schemes s
cross apply [fnSplitStrings_XML](s.[Columns],';') a --use whatever string split function is working best for you (performance wise)
DECLARE #Fields TABLE --If this is an ETL process, maybe make this a permanent table with an auto incrementing Id and reference this table in all steps after this.
(
[Table] NVARCHAR(50),
[Columns] NVARCHAR(MAX),
[Column] VARCHAR(MAX),
[Value] VARCHAR(MAX),
OrderInList int
)
INSERT INTO #Fields([Table],[Columns],[Column],[Value],OrderInList)
Select pu.[Table],pu.[Values] as [Columns],s.Item as [Column],pu.Item as [Value],pu.RowNumber
from #PuWithOrder pu
join #SchemaWithOrder s on pu.[Table]=s.[Table] and pu.RowNumber=s.RowNumber
Select [Table],[Columns],[Column],[Value],OrderInList
from #Fields
order by [Table],[Columns],OrderInList
END
GO
EXEC uspGetColumnValues
GO
Update:
Since your working implementation is a table-valued function, I have another recommendation. The problem I see is that your using a table valued function which ultimately handles one record at a time. You are going to have better performance with set based operations and batching as needed. With a tabled valued function, you are likely going to be looping through each row. If this is some sort of ETL process, your team will be better off if you have a stored procedure that processes the rows in bulk. It might make sense to stage the results into a better table that your team can work with down stream rather than have them use a potentially slow table-valued function.

Which query will execute faster, a query which uses table object or a query which uses temporary table in sql server

I have created a stored procedure in which I have used a table object and inserted some column in it. Below is the procedure:
CREATE Procedure [dbo].[usp_Security] (#CredentialsList dbo.Type_UserCredentialsList ReadOnly) As
Begin
Declare
#Result Table
(
IdentityColumn Int NOT NULL Identity (1, 1) PRIMARY KEY,
UserCredentials nVarChar(4000),
UserName nVarChar(100),
UserRole nVarChar(200),
RoleID Int,
Supervisor Char(3),
AcctMaintDecn Char(3),
EditPendInfo Char(3),
ReqInstID Char(3)
)
Insert Into #Result
Select Distinct UserCredentials, 'No', D.RoleName, D.RoleID,'No', 'No', 'No' From #CredentialsList A
Join SecurityRepository.dbo.SecurityUsers B On CharIndex(B.DomainAccount, A.UserCredentials) > 0
Join SecurityRepository.dbo.SecurityUserRoles C On C.UserID = B.UserID
Join SecurityRepository.dbo.SecurityRoles D On D.RoleID = C.RoleID
Where D.RoleName Like 'AOT.%' And B.IsActive = 1 And D.IsActive = 1
Update A
Set A.UserName = B.UserName
From #Result A
Join #CredentialsList B On A.UserCredentials = B.UserCredentials
-- "Supervisor" Column
Update A
Set A.Supervisor = 'Yes'
From #Result A
Join SecurityRepository.dbo.SecurityUsers B On CharIndex(B.DomainAccount, A.UserCredentials) > 0
Join SecurityRepository.dbo.SecurityUserRoles C On C.UserID = B.UserID
Join SecurityRepository.dbo.SecurityRoles D On D.RoleID = C.RoleID
Where D.RoleName In ('AOT.Manager', 'AOT.Deps Ops Admin', 'AOT.Fraud Manager', 'AOT.Fulfillment Manager')
And B.IsActive = 1 And D.IsActive = 1
-- Return Result
Select * From #Result Order By UserName, UserRole
End
In the above procedure, I have made the use of Table object and then created a clustered index on that table object.
However, if I create a temporary table and then process the above info in the SP, will it be faster than using table object instead of temporary table. I tried creating a seperate Clustered index on a column in a table object, but it does not allow me to create it as we cannot create an index on a table object.
I wanted to make use of temporary table in the above stored procedure, but will it reduce the cost as compared to the use of table object.
It depends! - as always there are a lot of factors that come into play here.
A table variable tends to work best for small numbers of rows - e.g. 10, 20 rows - since it never has statistics, cannot have indices on it, and the SQL Server query optimizer will always assume it has just a single row of data. If you have too many rows in a table variable, this will badly skew the execution plan being determined.
Furthermore, the table variable doesn't participate in transaction handling, which can be a good or a bad thing - so if you insert 10 rows into a table variable inside a transaction and then roll back that transaction - those rows are still in your table variable. Just be aware of that!
The temporary table works best if you intend to have rather many rows, if you might even need to index something.
Temporary tables also behave just like regular tables in transactional processing, e.g. a transaction will affect those temporary tables.
But again: the real way to find out is to try it and measure it - and try again and measure again.

SQL - Inserting and Updating Multiple Records at Once

I have a stored procedure that is responsible for inserting or updating multiple records at once. I want to perform this in my stored procedure for the sake of performance.
This stored procedure takes in a comma-delimited list of permit IDs and a status. The permit IDs are stored in a variable called #PermitIDs. The status is stored in a variable called #Status. I have a user-defined function that converts this comma-delimited list of permit IDs into a Table. I need to go through each of these IDs and do either an insert or update into a table called PermitStatus.
If a record with the permit ID does not exist, I want to add a record. If it does exist, I'm want to update the record with the given #Status value. I know how to do this for a single ID, but I do not know how to do it for multiple IDs. For single IDs, I do the following:
-- Determine whether to add or edit the PermitStatus
DECLARE #count int
SET #count = (SELECT Count(ID) FROM PermitStatus WHERE [PermitID]=#PermitID)
-- If no records were found, insert the record, otherwise add
IF #count = 0
BEGIN
INSERT INTO
PermitStatus
(
[PermitID],
[UpdatedOn],
[Status]
)
VALUES
(
#PermitID,
GETUTCDATE(),
1
)
END
ELSE
UPDATE
PermitStatus
SET
[UpdatedOn]=GETUTCDATE(),
[Status]=#Status
WHERE
[PermitID]=#PermitID
How do I loop through the records in the Table returned by my user-defined function to dynamically insert or update the records as needed?
create a split function, and use it like:
SELECT
*
FROM YourTable y
INNER JOIN dbo.splitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach
For this method to work, you need to do this one time table setup:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this function:
CREATE FUNCTION [dbo].[FN_ListToTableAll]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this WILL return empty rows
----------------
SELECT
ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(#SplitOn, ListValue, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS ListValue
) AS InnerQuery
INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
WHERE SUBSTRING(ListValue, number, 1) = #SplitOn
);
GO
You can now easily split a CSV string into a table and join on it:
select * from dbo.FN_ListToTableAll(',','1,2,3,,,4,5,6777,,,')
OUTPUT:
RowNumber ListValue
----------- ----------
1 1
2 2
3 3
4
5
6 4
7 5
8 6777
9
10
11
(11 row(s) affected)
To make what you need work, do the following:
--this would be the existing table
DECLARE #OldData table (RowID int, RowStatus char(1))
INSERT INTO #OldData VALUES (10,'z')
INSERT INTO #OldData VALUES (20,'z')
INSERT INTO #OldData VALUES (30,'z')
INSERT INTO #OldData VALUES (70,'z')
INSERT INTO #OldData VALUES (80,'z')
INSERT INTO #OldData VALUES (90,'z')
--these would be the stored procedure input parameters
DECLARE #IDList varchar(500)
,#StatusList varchar(500)
SELECT #IDList='10,20,30,40,50,60'
,#StatusList='A,B,C,D,E,F'
--stored procedure local variable
DECLARE #InputList table (RowID int, RowStatus char(1))
--convert input prameters into a table
INSERT INTO #InputList
(RowID,RowStatus)
SELECT
i.ListValue,s.ListValue
FROM dbo.FN_ListToTableAll(',',#IDList) i
INNER JOIN dbo.FN_ListToTableAll(',',#StatusList) s ON i.RowNumber=s.RowNumber
--update all old existing rows
UPDATE o
SET RowStatus=i.RowStatus
FROM #OldData o WITH (UPDLOCK, HOLDLOCK) --to avoid race condition when there is high concurrency as per #emtucifor
INNER JOIN #InputList i ON o.RowID=i.RowID
--insert only the new rows
INSERT INTO #OldData
(RowID, RowStatus)
SELECT
i.RowID, i.RowStatus
FROM #InputList i
LEFT OUTER JOIN #OldData o ON i.RowID=o.RowID
WHERE o.RowID IS NULL
--display the old table
SELECT * FROM #OldData order BY RowID
OUTPUT:
RowID RowStatus
----------- ---------
10 A
20 B
30 C
40 D
50 E
60 F
70 z
80 z
90 z
(9 row(s) affected)
EDIT thanks to #Emtucifor click here for the tip about the race condition, I have included the locking hints in my answer, to prevent race condition problems when there is high concurrency.
There are various methods to accomplish the parts you ask are asking about.
Passing Values
There are dozens of ways to do this. Here are a few ideas to get you started:
Pass in a string of identifiers and parse it into a table, then join.
SQL 2008: Join to a table-valued parameter
Expect data to exist in a predefined temp table and join to it
Use a session-keyed permanent table
Put the code in a trigger and join to the INSERTED and DELETED tables in it.
Erland Sommarskog provides a wonderful comprehensive discussion of lists in sql server. In my opinion, the table-valued parameter in SQL 2008 is the most elegant solution for this.
Upsert/Merge
Perform a separate UPDATE and INSERT (two queries, one for each set, not row-by-row).
SQL 2008: MERGE.
An Important Gotcha
However, one thing that no one else has mentioned is that almost all upsert code, including SQL 2008 MERGE, suffers from race condition problems when there is high concurrency. Unless you use HOLDLOCK and other locking hints depending on what's being done, you will eventually run into conflicts. So you either need to lock, or respond to errors appropriately (some systems with huge transactions per second have used the error-response method successfully, instead of using locks).
One thing to realize is that different combinations of lock hints implicitly change the transaction isolation level, which affects what type of locks are acquired. This changes everything: which other locks are granted (such as a simple read), the timing of when a lock is escalated to update from update intent, and so on.
I strongly encourage you to read more detail on these race condition problems. You need to get this right.
Conditional Insert/Update Race Condition
“UPSERT” Race Condition With MERGE
Example Code
CREATE PROCEDURE dbo.PermitStatusUpdate
#PermitIDs varchar(8000), -- or (max)
#Status int
AS
SET NOCOUNT, XACT_ABORT ON -- see note below
BEGIN TRAN
DECLARE #Permits TABLE (
PermitID int NOT NULL PRIMARY KEY CLUSTERED
)
INSERT #Permits
SELECT Value FROM dbo.Split(#PermitIDs) -- split function of your choice
UPDATE S
SET
UpdatedOn = GETUTCDATE(),
Status = #Status
FROM
PermitStatus S WITH (UPDLOCK, HOLDLOCK)
INNER JOIN #Permits P ON S.PermitID = P.PermitID
INSERT PermitStatus (
PermitID,
UpdatedOn,
Status
)
SELECT
P.PermitID,
GetUTCDate(),
#Status
FROM #Permits P
WHERE NOT EXISTS (
SELECT 1
FROM PermitStatus S
WHERE P.PermitID = S.PermitID
)
COMMIT TRAN
RETURN ##ERROR;
Note: XACT_ABORT helps guarantee the explicit transaction is closed following a timeout or unexpected error.
To confirm that this handles the locking problem, open several query windows and execute an identical batch like so:
WAITFOR TIME '11:00:00' -- use a time in the near future
EXEC dbo.PermitStatusUpdate #PermitIDs = '123,124,125,126', 1
All of these different sessions will execute the stored procedure in nearly the same instant. Check each session for errors. If none exist, try the same test a few times more (since it's possible to not always have the race condition occur, especially with MERGE).
The writeups at the links I gave above give even more detail than I did here, and also describe what to do for the SQL 2008 MERGE statement as well. Please read those thoroughly to truly understand the issue.
Briefly, with MERGE, no explicit transaction is needed, but you do need to use SET XACT_ABORT ON and use a locking hint:
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Table WITH (HOLDLOCK) AS TableAlias
...
This will prevent concurrency race conditions causing errors.
I also recommend that you do error handling after each data modification statement.
If you're using SQL Server 2008, you can use table valued parameters - you pass in a table of records into a stored procedure and then you can do a MERGE.
Passing in a table valued parameter would remove the need to parse CSV strings.
Edit:
ErikE has raised the point about race conditions, please refer to his answer and linked articles.
If you have SQL Server 2008, you can use MERGE. Here's an article describing this.
You should be able to do your insert and your update as two set based queries.
The code below was based on a data load procedure that I wrote a while ago that took data from a staging table and inserted or updated it into the main table.
I've tried to make it match your example, but you may need to tweak this (and create a table valued UDF to parse your CSV into a table of ids).
-- Update where the join on permitstatus matches
Update
PermitStatus
Set
[UpdatedOn]=GETUTCDATE(),
[Status]=staging.Status
From
PermitStatus status
Join
StagingTable staging
On
staging.PermitId = status.PermitId
-- Insert the new records, based on the Where Not Exists
Insert
PermitStatus(Updatedon, Status, PermitId)
Select (GETUTCDATE(), staging.status, staging.permitId
From
StagingTable staging
Where Not Exists
(
Select 1 from PermitStatus status
Where status.PermitId = staging.PermidId
)
Essentially you have an upsert stored procedure (eg. UpsertSinglePermit)
(like the code you have given above) for dealing with one row.
So the steps I see are to create a new stored procedure (UpsertNPermits) which does
a) Parse input string into n record entries (each record contains permit id and status)
b) Foreach entry in above, invoke UpsertSinglePermit

Resources