Do Inserted Records Always Receive Contiguous Identity Values - sql-server

Consider the following SQL:
CREATE TABLE Foo
(
ID int IDENTITY(1,1),
Data nvarchar(max)
)
INSERT INTO Foo (Data)
SELECT TOP 1000 Data
FROM SomeOtherTable
WHERE SomeColumn = #SomeParameter
DECLARE #LastID int
SET #LastID = SCOPE_IDENTITY()
I would like to know if I can depend on the 1000 rows that I inserted into table Foo having contiguous identity values. In order words, if this SQL block produces a #LastID of 2000, can I know for certain that the ID of the first record I inserted was 1001? I am mainly curious about multiple statements inserting records into table Foo concurrently.
I know that I could add a serializable transaction around my insert statement to ensure the behavior that I want, but do I really need to? I'm worried that introducing a serializable transaction will degrade performance, but if SQL Server won't allow other statements to insert into table Foo while this statement is running, then I don't have to worry about it.

I disagree with the accepted answer. This can easily be tested and disproved by running the following.
Setup
USE tempdb
CREATE TABLE Foo
(
ID int IDENTITY(1,1),
Data nvarchar(max)
)
Connection 1
USE tempdb
SET NOCOUNT ON
WHILE NOT EXISTS(SELECT * FROM master..sysprocesses WHERE context_info = CAST('stop' AS VARBINARY(128) ))
BEGIN
INSERT INTO Foo (Data)
VALUES ('blah')
END
Connection 2
USE tempdb
SET NOCOUNT ON
SET CONTEXT_INFO 0x
DECLARE #Output TABLE(ID INT)
WHILE 1 = 1
BEGIN
/*Clear out table variable from previous loop*/
DELETE FROM #Output
/*Insert 1000 records*/
INSERT INTO Foo (Data)
OUTPUT inserted.ID INTO #Output
SELECT TOP 1000 NEWID()
FROM sys.all_columns
IF EXISTS(SELECT * FROM #Output HAVING MAX(ID) - MIN(ID) <> 999 )
BEGIN
/*Set Context Info so other connection inserting
a single record in a loop terminates itself*/
DECLARE #stop VARBINARY(128)
SET #stop = CAST('stop' AS VARBINARY(128))
SET CONTEXT_INFO #stop
/*Return results for inspection*/
SELECT ID, DENSE_RANK() OVER (ORDER BY Grp) AS ContigSection
FROM
(SELECT ID, ID - ROW_NUMBER() OVER (ORDER BY [ID]) AS Grp
FROM #Output) O
ORDER BY ID
RETURN
END
END

Yes, they will be contiguous because the INSERT is atomic: complete success or full rollback. It is also performed as a single unit of work: you wont get any "interleaving" with other processes
However (or to put your mind at rest!), consider the OUTPUT clause
DECLARE #KeyStore TABLE (ID int NOT NULL)
INSERT INTO Foo (Data)
OUTPUT INSERTED.ID INTO #KeyStore (ID) --this line
SELECT TOP 1000 Data
FROM SomeOtherTable
WHERE SomeColumn = #SomeParameter

If you want the Identity values for multiple rows use OUTPUT:
DECLARE #NewIDs table (PKColumn int)
INSERT INTO Foo (Data)
OUTPUT INSERTED.PKColumn
INTO #NewIDs
SELECT TOP 1000 Data
FROM SomeOtherTable
WHERE SomeColumn = #SomeParameter
you now have the entire set of values in the #NewIDs table. You can add any columns from the Foo table into the #NewIDs table and insert those columns as well.

It is not good practice to attach any sort of meaning whatsoever to identity values. You should assume that they are nothing more than integers guaranteed to be unique within the scope of your table.

Try adding the following:
option(maxdop 1)

Related

Count # of Rows in Stored Procedure Result, then Insert result into table

I have an SSIS package which will first run my sp_doSomething. This stored procedure will select data from several different tables and join them for possible storage into dbo.someTable. But I only want that IF the select is > 1 row of selected data.
I want to then have a precedence restraint that looks at the amount of rows my stored procedure returned.
If my row count > 1, then I want to take the results of the stored procedure and insert them into one of my tables.
Otherwise, I will record an error/send an email, or whatever.
I really don't want to run this stored procedure more then once, but that is the only way I could think to do it (Run it, count the rows. Then, run it again and insert the result).
I'm a complete TSQL/SSIS newb. So I'm sorry if this question is trivial.
I can't find a good answer anywhere.
Create a variable with Package Scope of type Int32 and name rowcount.
Data Flow
Control Flow
you can try this
declare #tableVar table(col1 varchar(100))
declare #Counter int
insert into #tableVar(col1) exec CompanyNames
set #Counter = (select count(*) from #tableVar)
insert into Anytable(col) Values (#counter)
Within the Stored Proc, write the results to a #Temp. Then Select Count(*) from the #Temp, into a variable.
Select #intRows = Count(*) from myTempResults
Then evaluate the value of #intRows.
If #intRows > 1 BEGIN
Insert Into dbo.SomeTable
Select * from #Temp
End
Will a #temp table work for you?
IF OBJECT_ID('tempdb..#Holder') IS NOT NULL
begin
drop table #Holder
end
CREATE TABLE #Holder
(ID INT )
declare #MyRowCount int
declare #MyTotalCount int = 0
/* simulate your insert, you would read from your real table(s) here */
INSERT INTO #HOLDER (ID)
select 1 union all select 2 union all select 3 union all select 4
Select #MyRowCount = ##ROWCOUNT, #MyTotalCount = #MyTotalCount + #MyRowCount
Select 'TheMagicValue1' = #MyRowCount, 'TheMagicTotal' = #MyTotalCount
INSERT INTO #HOLDER (ID)
select 5 union all select 6 union all select 7 union all select 8
/* you will note that I am NOT doing a count(*) here... which is another strain on the procedure */
Select #MyRowCount = ##ROWCOUNT, #MyTotalCount = #MyTotalCount + #MyRowCount
Select 'TheMagicValue1' = #MyRowCount, 'TheMagicTotal' = #MyTotalCount
/* Optional index if needed */
CREATE INDEX IDX_TempHolder_ID ON #Holder (ID)
/* CREATE CLUSTERED INDEX IDX_TempHolder_ID ON #Holder (ID) */
if #MyTotalCount > 0
BEGIN
Select 'Put your INSERT statement here'
END
/* this will return the data to the report */
Select ID from #HOLDER
IF OBJECT_ID('tempdb..#Holder') IS NOT NULL
begin
drop table #Holder
end

SQL Server procedure optimization testing pattern improvement

I've been doing some SQL Server procedures optimization lately and was looking for a testing pattern (time and result wise). I've came with this solution so far:
SET NOCOUNT ON;
----------------------------------------------------------------------------------------------------------------
-- Procedures data and performance testing pattern
----------------------------------------------------------------------------------------------------------------
-- Prepare test queries (most likely will be taken from Logs.ProcedureTraceData (DATAUK/DATAUS servers)
-- Procedures should insert records into Temporary table, so we can compare their results using EXCEPT
-- If result set columns are fixed (i.e. no Dynamic SQL is used), we can create Temporary tables inside script
-- and insert records in them to do comparison and just TRUNCATE them at the end of the loop.
-- example here: http://stackoverflow.com/a/654418/3680098
-- If there're any data discrepancies or record counts are different, it will be displayed in TraceLog table
----------------------------------------------------------------------------------------------------------------
-- Create your own TraceLog table to keep records
----------------------------------------------------------------------------------------------------------------
/*
CREATE TABLE Temporary._EB_TraceLog
(
ID INT NOT NULL IDENTITY(1, 1) CONSTRAINT PK_Temporary_EB_TraceLog_ID PRIMARY KEY
, CurrentExecutionTime INT
, TempExecutionTime INT
, CurrentExecutionResultsCount INT
, TempExecutionResultsCount INT
, IsDifferent BIT CONSTRAINT DF_Temporary_EB_TraceLog_IsDifferent DEFAULT 0 NOT NULL
, TimeDiff AS CurrentExecutionTime - TempExecutionTime
, PercentageDiff AS CAST(((CAST(CurrentExecutionTime AS DECIMAL)/ CAST(TempExecutionTime AS DECIMAL)) * 100 - 100) AS DECIMAL(10, 2))
, TextData NVARCHAR(MAX)
);
SELECT *
FROM Temporary._EB_TraceLog;
TRUNCATE TABLE Temporary._EB_TraceLog;
*/
INSERT INTO Temporary._EB_TraceLog (TextData)
SELECT TextData
FROM Temporary._EB_GetData_Timeouts
EXCEPT
SELECT TextData
FROM Temporary._EB_TraceLog;
DECLARE #Counter INT;
SELECT #Counter = MIN(ID)
FROM Temporary._EB_TraceLog
WHERE CurrentExecutionTime IS NULL
OR TempExecutionTime IS NULL
OR CurrentExecutionResultsCount IS NULL
OR TempExecutionResultsCount IS NULL;
WHILE #Counter <= (SELECT MAX(ID) FROM Temporary._EB_TraceLog)
BEGIN
DECLARE #SQLStringCurr NVARCHAR(MAX);
DECLARE #SQLStringTemp NVARCHAR(MAX);
DECLARE #StartTime DATETIME2;
SELECT #SQLStringCurr = REPLACE(TextData, 'dbo.GetData', 'Temporary._EB_GetData_Orig')
, #SQLStringTemp = REPLACE(TextData, 'dbo.GetData', 'Temporary._EB_GetData_Mod')
FROM Temporary._EB_TraceLog
WHERE ID = #Counter;
----------------------------------------------------------------------------------------------------------------
-- Drop temporary tables in script, so these numbers don't figure in SP execution time
----------------------------------------------------------------------------------------------------------------
IF OBJECT_ID(N'Temporary._EB_Test_Orig') IS NOT NULL
DROP TABLE Temporary._EB_Test_Orig;
IF OBJECT_ID(N'Temporary._EB_Test_Mod') IS NOT NULL
DROP TABLE Temporary._EB_Test_Mod;
----------------------------------------------------------------------------------------------------------------
-- Actual testing
----------------------------------------------------------------------------------------------------------------
-- Take time snapshot and execute original procedure, which inserts records into Temporary table
-- When done - measurements will be updated on TraceLog table
----------------------------------------------------------------------------------------------------------------
SELECT #StartTime = CURRENT_TIMESTAMP;
EXECUTE sp_executesql #SQLStringCurr;
UPDATE T
SET T.CurrentExecutionTime = DATEDIFF(MILLISECOND, #StartTime, CURRENT_TIMESTAMP)
FROM Temporary._EB_TraceLog AS T
WHERE T.ID = #Counter;
----------------------------------------------------------------------------------------------------------------
-- Take time snapshot and execute optimized procedure, which inserts records into Temporary table
-- When done - measurements will be updated on TraceLog table
----------------------------------------------------------------------------------------------------------------
SELECT #StartTime = CURRENT_TIMESTAMP;
EXECUTE sp_executesql #SQLStringTemp;
UPDATE T
SET T.TempExecutionTime = DATEDIFF(MILLISECOND, #StartTime, CURRENT_TIMESTAMP)
FROM Temporary._EB_TraceLog AS T
WHERE T.ID = #Counter;
----------------------------------------------------------------------------------------------------------------
-- Check if there are any data discrepancies
-- If there are any, set IsDifferent to 1, so we can find the root cause
----------------------------------------------------------------------------------------------------------------
IF EXISTS (SELECT * FROM Temporary._EB_Test_Mod EXCEPT SELECT * FROM Temporary._EB_Test_Orig)
OR EXISTS (SELECT * FROM Temporary._EB_Test_Orig EXCEPT SELECT * FROM Temporary._EB_Test_Mod)
BEGIN
UPDATE T
SET T.IsDifferent = 1
FROM Temporary._EB_TraceLog AS T
WHERE T.ID = #Counter;
END
----------------------------------------------------------------------------------------------------------------
-- Update record counts for each execution
-- We can check if there aren't any different record counts even tho results are same
-- EXCEPT clause removes duplicates when doing checks
----------------------------------------------------------------------------------------------------------------
UPDATE T
SET T.CurrentExecutionResultsCount = (SELECT COUNT(*) FROM Temporary._EB_Test_Orig)
, T.TempExecutionResultsCount = (SELECT COUNT(*) FROM Temporary._EB_Test_Mod)
FROM Temporary._EB_TraceLog AS T
WHERE T.ID = #Counter;
----------------------------------------------------------------------------------------------------------------
-- Print iteration number and proceed on next one
----------------------------------------------------------------------------------------------------------------
PRINT #Counter;
SET #Counter += 1;
END
SELECT *
FROM Temporary._EB_TraceLog;
This works quite well so far, but I would like to include IO and TIME statistics in each iteration. Is that possible?
I know I can do it using:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
But is there a way to grab summed up values and put them in my TraceLog table?
And on top of that, is there anything doesn't make sense in this piece of code?
Thanks
you can use this query
SELECT total_elapsed_time
FROM sys.dm_exec_query_stats
WHERE sql_handle in (SELECT most_recent_sql_handle
FROM sys.dm_exec_connections
CROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle)
WHERE session_id = (##spid))

Insert random text into columns from a reference table variable

I have a table ABSENCE that has 40 employee ids and need to add two columns from a table variable, which acts as a reference table. For each emp id, I need to randomly assign the values from the table variable. Here's the code I tried without randomizing:
USE TSQL2012;
GO
DECLARE #MAX SMALLINT;
DECLARE #MIN SMALLINT;
DECLARE #RECODE SMALLINT;
DECLARE #RE CHAR(100);
DECLARE #rearray table (recode smallint,re char(100));
insert into #rearray values (100,'HIT BY BEER TRUCK')
,(200,'BAD HAIR DAY')
,(300,'ASPIRIN OVERDOSE')
,(400,'MAKEUP DISASTER')
,(500,'GOT LOCKED IN THE SALOON')
DECLARE #REFCURSOR AS CURSOR;
SET #REFCURSOR = CURSOR FOR
SELECT RECODE,RE FROM #REARRAY;
OPEN #REFCURSOR;
SET #MAX = (SELECT DISTINCT ##ROWCOUNT FROM ABSENCE);
SET #MIN = 0;
ALTER TABLE ABSENCE ADD CODE SMALLINT, REASONING CHAR(100);
WHILE (#MIN <= #MAX)
BEGIN
FETCH NEXT FROM #REFCURSOR INTO #RECODE,#RE;
INSERT INTO ABSENCE (CODE, REASONING) VALUES (#RECODE,#RE);
SET #MIN+=1;
END
CLOSE #REFCURSOR
DEALLOCATE #REFCURSOR
SELECT EMPID,CODE,REASONING FROM ABSENCE
Though am inserting into two columns only, it is attempting to insert into empid (which has already been filled) and as it cannot be NULL, the insertion fails.
Also, how to randomize the values from the REARRAY table variable to insert them into the ABSENCE table?
Since this is a small dataset, one approach might be to use CROSS APPLY with a SELECT TOP(1) ... FROM #rearray ORDER BY NEWID() approach. This will essentially join your ABSENCE table with your reference table in an UPDATE statement, selecting a random row each time in the join. In full, it would look like:
UPDATE ABSENCE
SET col1 = x1.recode, col2 = x2.recode
FROM ABSENCE a
CROSS APPLY (SELECT TOP(1) * FROM #rearray ORDER BY NEWID()) x1(recode, re)
CROSS APPLY (SELECT TOP(1) * FROM #rearray ORDER BY NEWID()) x2(recode, re)

Bulk insert from a table to another

We have 2 tables in SQL Server 2008 R2. Periodically, we have to insert a batch of records from Table A to Table B. While the inserting, Table B still able to SELECT & UPDATE. Currently, we use INSERT..SELECT to copy from Table A to Table B. But the problem is while inserting, sometimes will cause UPDATE statement to TABLE B timeout.
Is there a better bulk insert solution from a table to another that won't cause blocking?
They most obvious solution is to use smaller batches as Stanley suggested. If this is really not an option you could explore '(transaction level) snapshot isolation.
1 Set the transaction timeout to a large enough value, so that the statement no longer goes in timeout.
2 Use CURSOR and do it row by row
3 Try this way of doing things. Requires a row identifier (IDENTITY for instance), best to have a PK or INDEX on that field:
SET NOCOUNT ON;
CREATE TABLE #A(
row_id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
data INT NOT NULL
);
CREATE TABLE #B(
row_id INT NOT NULL PRIMARY KEY,
data INT NOT NULL
);
-- TRUNCATE TABLE #B; -- no truncate needed since you just want to add rows, not copy the whole table
DECLARE #batch_size INT;
SET #batch_size = 10000;
DECLARE #from_row_id INT;
DECLARE #to_row_id INT;
-- You would use this to establish the first #from_row_id if you wanted to copy the whole table
-- SELECT
-- #from_row_id=ISNULL(MIN(row_id),-1)
-- FROM
-- #A AS a;
SELECT
#from_row_id=ISNULL(MAX(row_id),-1)
FROM
#B AS b;
IF #from_row_id=-1
SELECT
#from_row_id=ISNULL(MIN(row_id),-1)
FROM
#A AS a;
ELSE
SELECT
#from_row_id=ISNULL(MIN(row_id),-1)
FROM
#A AS a
WHERE
row_id>#from_row_id;
WHILE #from_row_id>=0
BEGIN
SELECT
#to_row_id=ISNULL(MAX(row_id),-1)
FROM
(
SELECT TOP(#batch_size)
row_id
FROM
#A AS a
WHERE
row_id>=#from_row_id
) AS row_ids
IF #to_row_id=-1
BEGIN
INSERT
#B
SELECT
*
FROM
#A AS a
WHERE
row_id>=#from_row_id;
BREAK;
END
ELSE
INSERT
#B
SELECT
*
FROM
#A AS a
WHERE
row_id BETWEEN #from_row_id AND #to_row_id;
SELECT
#from_row_id=ISNULL(MIN(row_id),-1)
FROM
#A AS a
WHERE
row_id>#to_row_id;
END
DROP TABLE #B;
DROP TABLE #A;

coalesce two records into one

I have a table that stores two values; 'total' and 'owing' for each customer. Data is uploaded to the table using two files, one that brings in 'total' and the other brings in 'owing'. This means I have two records for each customerID:
customerID:--------Total:--------- Owing:
1234---------------- 1000----------NULL
1234-----------------NULL-----------200
I want to write a stored procedure that merges the two records together:
customerID:--------Total:--------- Owing:
1234---------------- 1000----------200
I have seen examples using COALESCE so put together something like this:
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
--Variable declarations
DECLARE #customer_id varchar(20)
DECLARE #total decimal(15,8)
DECLARE #owing decimal(15,8)
DECLARE #customer_name_date varchar(255)
DECLARE #organisation varchar(4)
DECLARE #country_code varchar(2)
DECLARE #created_date datetime
--Other Variables
DECLARE #totals_staging_id int
--Get the id of the first row in the staging table
SELECT #totals_staging_id = MIN(totals_staging_id)
from TOTALS_STAGING
--iterate through the staging table
WHILE #totals_staging_id is not null
BEGIN
update TOTALS_STAGING
SET
total = coalesce(#total, total),
owing = coalesce(#owing, owing)
where totals_staging_id = #totals_staging_id
END
END
Any Ideas?
SELECT t1.customerId, t1.total, t2.owing FROM test t1 JOIN test t2 ON ( t1.customerId = t2.customerId) WHERE t1.total IS NOT NULL AND t2.owing IS NOT NULL
Wondering why aren't you just using UPDATE on a second file execution?
Except for COUNT, aggregate functions ignore null values. Aggregate
functions are frequently used with the GROUP BY clause of the SELECT
statement. MSDN
So you don't need to worry about null values with summing. Following will give your merging records together. Fiddle-demo
select customerId,
sum(Total) Total,
sum(Owing) Owing
from T
Group by customerId
Try this :
CREATE TABLE #Temp
(
CustomerId int,
Total int,
Owing int
)
insert into #Temp
values (1024,100,null),(1024,null,200),(1025,10,null)
Create Table #Final
(
CustomerId int,
Total int,
Owing int
)
insert into #Final
values (1025,100,50)
MERGE #Final AS F
USING
(SELECT customerid,sum(Total) Total,sum(owing) owing FROM #Temp
group by #Temp.customerid
) AS a
ON (F.customerid = a.customerid)
WHEN MATCHED THEN UPDATE SET F.Total = F.Total + isnull(a.Total,0)
,F.Owing = F.Owing + isnull(a.Owing,0)
WHEN NOT MATCHED THEN
INSERT (CustomerId,Total,Owing)
VALUES (a.customerid,a.Total,a.owing);
select * from #Final
drop table #Temp
drop table #Final
This should work:
SELECT CustomerID,
COALESCE(total1, total2) AS Total,
COALESCE(owing1, owing2) AS Owing
FROM
(SELECT row1.CustomerID AS CustomerID,
row1.Total AS total1,
row2.Total AS total2,
row1.Owing AS owing1,
row2.Owing AS owing2
FROM YourTable row1 INNER JOIN YourTable row2 ON row1.CustomerID = row2.CustomerID
WHERE row1.Total IS NULL AND row2.Total IS NOT NULL) temp
--Note: Alter the WHERE clause as necessary to ensure row1 and row2 are unique.
...but note that you'll need some mechanism to ensure row1 and row2 are unique. My WHERE clause is an example based on the data you provided. You'll need to tweak this to add something more specific to your business rules.

Resources