Insert SCOPE_IDENTITY at time of insert - sql-server

I am using SQL Server 2008. Here is my sample table
SEQ NAME GROUP
1 abc 1
2 bcd 1
3 cde 3
In the above table, SEQ is my identity column (auto number). If I want to insert a new row with name = 'def' and group = 3, I can do it as follows
INSERT INTO SampleTable(NAME, GROUP) VALUES ('def',3)
Now, if I want to insert a new row and then set GROUP = SEQ of the newly inserted row, I am doing it in two steps as shown below
INSERT INTO SampleTable(NAME, GROUP) VALUES ('def',999)
UPDATE SampleTable SET GROUP = SEQ where NAME = 'def'
Is there any way to do it in a single step? For example
INSERT INTO SampleTable(NAME, GROUP) VALUES ('def',SCOPE_IDENTITY())
The above statement obviously doesn't work since SCOPE_IDENTITY() is set only after the insert has completed. But is there any way to set GROUP = SEQ using a single insert statement?

A minor improvement would be changing the UPDATE to filter on SEQ:
INSERT INTO SampleTable(NAME, GROUP) VALUES ('def',999)
UPDATE SampleTable SET GROUP = SEQ where SEQ = SCOPE_IDENTITY()
I like this better because we are likely to get a clustered index seek by filtering on SEQ. Also, I think it is semantically cleaner.
I don't know of any way to do this in one statement with an IDENTITY column (even thinking about OUTPUT and MERGE nothing comes to mind). You can do it with a SQL Server 2012 SEQUENCE though by using the INSERT...SELECT syntax. In the select part, you pop one number from the sequence and copy it into two columns.

you can make Group a computed column and set its value to be SEQ while creating the table itself.Then you just need to insert NAME and both SEQ and GROUP will be set automatically

Related

SEQUENCE in SQL Server 2008 R2

I need to know if there is any way to have a SEQUENCE or something like that, as we have in Oracle. The idea is to get one number and then use it as a key to save some records in a table. Each time we need to save data in that table, first we get the next number from the sequence and then we use the same to save some records. Is not an IDENTITY column.
For example:
[ID] [SEQUENCE ID] [Code] [Value]
1 1 A 232
2 1 B 454
3 1 C 565
Next time someone needs to add records, the next SEQUENCE ID should be 2, is there any way to do it? the sequence could be a guid for me as well.
As Guillelon points out, the best way to do this in SQL Server is with an identity column.
You can simply define a column as being identity. When a new row is inserted, the identity is automatically incremented.
The difference is that the identity is updated on every row, not just some rows. To be honest, think this is a much better approach. Your example suggests that you are storing both an entity and detail in the same table.
The SequenceId should be the primary identity key in another table. This value can then be used for insertion into this table.
This can be done using multiple ways, Following is what I can think of
Creating a trigger and there by computing the possible value
Adding a computed column along with a function that retrieves the next value of the sequence
Here is an article that presents various solutions
One possible way is to do something like this:
-- Example 1
DECLARE #Var INT
SET #Var = Select Max(ID) + 1 From tbl;
INSERT INTO tbl VALUES (#var,'Record 1')
INSERT INTO tbl VALUES (#var,'Record 2')
INSERT INTO tbl VALUES (#var,'Record 3')
-- Example 2
INSERT INTO #temp VALUES (1,2)
INSERT INTO #temp VALUES (1,2)
INSERT INTO ActualTable (col1, col2, sequence)
SELECT temp.*, (SELECT MAX(ID) + 1 FROM ActualTable)
FROM #temp temp
-- Example 3
DECLARE #var int
INSERT INTO ActualTable (col1, col2, sequence) OUTPUT #var = inserted.sequence VALUES (1, 2, (SELECT MAX(ID) + 1 FROM ActualTable))
The first two examples rely on batch updating. But based on your comment, I have added example 3 which is a single input initially. You can then use the sequence that was inserted to insert the rest of the records. If you have never used an output, please reply in comments and I will expand further.
I would isolate all of the above inside of a transactions.
If you were using SQL Server 2012, you could use the SEQUENCE operator as shown here.
Forgive me if syntax errors, don't have SSMS installed

TSQL: getting next available ID

Using SQL Server 2008, have three tables, table a, table b and table c.
All have an ID column, but for table a and b the ID column is an identity integer, for table c the ID column is a varchar type
Currently a stored procedure take a name param, following certain logic, insert to table a or table b, get the identity, prefix with 'A' or 'B' then insert to table c.
Problem is, table C ID column potentially have the duplicated values, i.e. if identity from table A is 2, there might already have 'A2','A3','A5' in the ID column for table C, how to write a T-SQL query to identify the next available value in table C then ensure to update table A/B accordingly?
[Update]
this is the current step,
1. depends on input parameter, insert to table A or table B
2. initialize seed value = ##Identity
3. calculate ID value to insert to table C by prefix 'A' or append 'B' with the seed value
4. look for record match in table C by ID value from step 3, if didn't find any record, insert it, else increase seed value by 1 then repeat step 3
The issue being at a certain value range, there could be a huge block of value exists in table C ID, i.e. A3000 to A500000 existed now in table C ID, the database query is extemely slow if follow the existing logic. Needs to figure out a logic to smartly get the minimum available number (without the prefix)
it is hard to describe, hope this make more sense, I truly appreciate any help on this Thanks in advance!
This should do the trick. Simple self extracting example will work in SSMS. I even made it out of order just in case. You would just change your table to be where #Data is and then change Identifier field to replace 'ID'.
declare #Data Table ( Id varchar(3) );
insert into #Data values ('A5'),('A2'),('B1'),('A3'),('B2'),('A4'),('A1'),('A6');
With a as
(
Select
ID
, cast(right(Id, len(Id)-1) as int) as Pos
, left(Id, 1) as TableFrom
from #Data
)
select
TableFrom
, max(Pos) + 1 as NextNumberUp
from a
group by TableFrom
EDIT: If you want to not worry about production data you could add this last part amending what I wrote:
Select
TableFrom
, max(Pos) as LastPos
into #Temp
from a
group by TableFrom
select TableFrom, LastPos + 1
from #Temp
Regardless if this was production environment you are going to have to hit part of it at some time to get data. If the datasets are not too large and just varchar(256) or less and only 5 million rows or less you could dump that entire column from tableC to a temp table. Honestly query performance versus imports change vastly from system to system.
Following your design there shouldn't be any duplicates in Table C considering that A and B are unique.
A | B | C
1 1 A1
2 2 A2
B1
B2

Preserving ORDER BY in SELECT INTO

I have a T-SQL query that takes data from one table and copies it into a new table but only rows meeting a certain condition:
SELECT VibeFGEvents.*
INTO VibeFGEventsAfterStudyStart
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id
The code using the table relies on its order, and the copy above does not preserve the order I expected. I.e. the rows in the new table VibeFGEventsAfterStudyStart are not monotonically increasing in the VibeFGEventsAfterStudyStart.id column copied from VibeFGEvents.id.
In T-SQL how might I preserve the ordering of the rows from VibeFGEvents in VibeFGEventsStudyStart?
I know this is a bit old, but I needed to do something similar. I wanted to insert the contents of one table into another, but in a random order. I found that I could do this by using select top n and order by newid(). Without the 'top n', order was not preserved and the second table had rows in the same order as the first. However, with 'top n', the order (random in my case) was preserved. I used a value of 'n' that was greater than the number of rows. So my query was along the lines of:
insert Table2 (T2Col1, T2Col2)
select top 10000 T1Col1, T1Col2
from Table1
order by newid()
What for?
Point is – data in a table is not ordered. In SQL Server the intrinsic storage order of a table is that of the (if defined) clustered index.
The order in which data is inserted is basically "irrelevant". It is forgotten the moment the data is written into the table.
As such, nothing is gained, even if you get this stuff. If you need an order when dealing with data, you HAVE To put an order by clause on the select that gets it. Anything else is random - i.e. the order you et data is not determined and may change.
So it makes no sense to have a specific order on the insert as you try to achieve.
SQL 101: sets have no order.
Just add top to your sql with a number that is greater than the actual number of rows:
SELECT top 25000 *
into spx_copy
from SPX
order by date
I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:
Amount of rows is very big (from 200 to 2000 millions of rows), so we are using SELECT INTO instead of CREATE TABLE + INSERT because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.
We need the data to be ordered by specific columns for query performances, so we are creating a CLUSTERED INDEX just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.
It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO is faster than loading without order and creating the clustered index later.
The problem is that SQL Server 2012+ will ignore the ORDER BY column list when doing INSERT INTO or when doing SELECT INTO. It will consider the ORDER BY columns if you specify an IDENTITY column on the SELECT INTO or if the inserted table has an IDENTITY column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.
A trick we have found is that doing a SELECT INTO with the result of a UNION ALL makes the engine perform a SORT (not always an explicit SORT operator, sometimes a MERGE JOIN CONCATENATION, etc.) if you have an ORDER BY list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.
So you can rewrite this query:
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
ORDER BY -- ORDER BY is ignored!
FirstColumn,
SecondColumn
to
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
UNION ALL
-- A "fake" row to be deleted
SELECT
FirstColumn = 0,
SecondColumn = 0
ORDER BY
FirstColumn,
SecondColumn
We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.
You cannot do this with ORDER BY but if you create a Clustered Index on VibeFGEvents.id after your SELECT INTO the table will be sorted on disk by VibeFGEvents.id.
I'v made a test on MS SQL 2012, and it clearly shows me, that insert into ... select ... order by makes sense. Here is what I did:
create table tmp1 (id int not null identity, name sysname);
create table tmp2 (id int not null identity, name sysname);
insert into tmp1 (name) values ('Apple');
insert into tmp1 (name) values ('Carrot');
insert into tmp1 (name) values ('Pineapple');
insert into tmp1 (name) values ('Orange');
insert into tmp1 (name) values ('Kiwi');
insert into tmp1 (name) values ('Ananas');
insert into tmp1 (name) values ('Banana');
insert into tmp1 (name) values ('Blackberry');
select * from tmp1 order by id;
And I got this list:
1 Apple
2 Carrot
3 Pineapple
4 Orange
5 Kiwi
6 Ananas
7 Banana
8 Blackberry
No surprises here. Then I made a copy from tmp1 to tmp2 this way:
insert into tmp2 (name)
select name
from tmp1
order by id;
select * from tmp2 order by id;
I got the exact response like before. Apple to Blackberry.
Now reverse the order to test it:
delete from tmp2;
insert into tmp2 (name)
select name
from tmp1
order by id desc;
select * from tmp2 order by id;
9 Blackberry
10 Banana
11 Ananas
12 Kiwi
13 Orange
14 Pineapple
15 Carrot
16 Apple
So the order in tmp2 is reversed too, so order by made sense when there is a identity column in the target table!
The reason why one would desire this (a specific order) is because you cannot define the order in a subquery, so, the idea is that, if you create a table variable, THEN make a query from that table variable, you would think you would retain the order(say, to concatenate rows that must be in order- say for XML or json), but you can't.
So, what do you do?
The answer is to force SQL to order it by using TOP in your select (just pick a number high enough to cover all your rows).
I have run into the same issue and one reason I have needed to preserve the order is when I try to use ROLLUP to get a weighted average based on the raw data and not an average of what is in that column. For instance, say I want to see the average of profit based on number of units sold by four store locations? I can do this very easily by creating the equation Profit / #Units = Avg. Now I include a ROLLUP in my GROUP BY so that I can also see the average across all locations. Now I think to myself, "This is good info but I want to see it in order of Best Average to Worse and keep the Overall at the bottom (or top) of the list)." The ROLLUP will fail you in this so you take a different approach.
Why not create row numbers based on the sequence (order) you need to preserve?
SELECT OrderBy = ROW_NUMBER() OVER(PARTITION BY 'field you want to count' ORDER BY 'field(s) you want to use ORDER BY')
, VibeFGEvents.*
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
Now you can use the OrderBy field from your table to set the order of values. I removed the ORDER BY statement from the query above since it does not affect how the data is loaded to the table.
I found this approach helpful to solve this problem:
WITH ordered as
(
SELECT TOP 1000
[Month]
FROM SourceTable
GROUP BY [Month]
ORDER BY [Month]
)
INSERT INTO DestinationTable (MonthStart)
(
SELECT * from ordered
)
Try using INSERT INTO instead of SELECT INTO
INSERT INTO VibeFGEventsAfterStudyStart
SELECT VibeFGEvents.*
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id`

Why the OUTPUT statement while inserting into the table inside the IF clause returns null on second column?

Here is my "upsert" code:
UPDATE LastTicket SET LastTicketNumber=LastTicketNumber+1
OUTPUT INSERTED.LastTicketNumber WHERE CategoryId='1';
IF ##ROWCOUNT=0 INSERT INTO LastTicket (CategoryId,LastTicketNumber)
OUTPUT INSERTED.LastTicketNumber VALUES ('1','2')
So, when the row exists, it successefully updates, the OUTPUT returns the new, incremented LastTicketNumber.
On the other hand, when the row does not exist, the sql server successefully creates it and populates with the data I am passing to SqlCommand (1,2). So, it creates the row, but returns null. Meaning nothing! Why is that? And why when i replace the "INSERTED.LastTicketNumber" with the "INSERTED.CategoryId" is BEGINS to return not-null, the category id. Why is that? And how to return what I need?
The table has only these two columns and nonclustered primary composite key on both of them.
(MSSQL 2008)
If no row exists in the table, the first time the batch runs it will return two result sets - the first being empty (because there is no row to update) and the second containing the inserted Id.
Perhaps you are seeing the first result set and not the second.
Try the following:
DECLARE #t table (LastTicketNumber int)
UPDATE LastTicket SET LastTicketNumber=LastTicketNumber+1
OUTPUT INSERTED.LastTicketNumber INTO #t (LastTicketNumber) WHERE CategoryId='1';
IF ##ROWCOUNT=0 INSERT INTO LastTicket (CategoryId,LastTicketNumber)
OUTPUT INSERTED.LastTicketNumber INTO #t (LastTicketNumber) VALUES ('1','2')
select LastTicketNumber from #t

How does sql server choose values in an update statement where there are multiple options?

I have an update statement in SQL server where there are four possible values that can be assigned based on the join. It appears that SQL has an algorithm for choosing one value over another, and I'm not sure how that algorithm works.
As an example, say there is a table called Source with two columns (Match and Data) structured as below:
(The match column contains only 1's, the Data column increments by 1 for every row)
Match Data
`--------------------------
1 1
1 2
1 3
1 4
That table will update another table called Destination with the same two columns structured as below:
Match Data
`--------------------------
1 NULL
If you want to update the ID field in Destination in the following way:
UPDATE
Destination
SET
Data = Source.Data
FROM
Destination
INNER JOIN
Source
ON
Destination.Match = Source.Match
there will be four possible options that Destination.ID will be set to after this query is run. I've found that messing with the indexes of Source will have an impact on what Destination is set to, and it appears that SQL Server just updates the Destination table with the first value it finds that matches.
Is that accurate? Is it possible that SQL Server is updating the Destination with every possible value sequentially and I end up with the same kind of result as if it were updating with the first value it finds? It seems to be possibly problematic that it will seemingly randomly choose one row to update, as opposed to throwing an error when presented with this situation.
Thank you.
P.S. I apologize for the poor formatting. Hopefully, the intent is clear.
It sets all of the results to the Data. Which one you end up with after the query depends on the order of the results returned (which one it sets last).
Since there's no ORDER BY clause, you're left with whatever order Sql Server comes up with. That will normally follow the physical order of the records on disk, and that in turn typically follows the clustered index for a table. But this order isn't set in stone, particularly when joins are involved. If a join matches on a column with an index other than the clustered index, it may well order the results based on that index instead. In the end, unless you give it an ORDER BY clause, Sql Server will return the results in whatever order it thinks it can do fastest.
You can play with this by turning your upate query into a select query, so you can see the results. Notice which record comes first and which record comes last in the source table for each record of the destination table. Compare that with the results of your update query. Then play with your indexes again and check the results once more to see what you get.
Of course, it can be tricky here because UPDATE statements are not allowed to use an ORDER BY clause, so regardless of what you find, you should really write the join so it matches the destination table 1:1. You may find the APPLY operator useful in achieving this goal, and you can use it to effectively JOIN to another table and guarantee the join only matches one record.
The choice is not deterministic and it can be any of the source rows.
You can try
DECLARE #Source TABLE(Match INT, Data INT);
INSERT INTO #Source
VALUES
(1, 1),
(1, 2),
(1, 3),
(1, 4);
DECLARE #Destination TABLE(Match INT, Data INT);
INSERT INTO #Destination
VALUES
(1, NULL);
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN #Source Source
ON Destination.Match = Source.Match;
SELECT *
FROM #Destination;
And look at the actual execution plan. I see the following.
The output columns from #Destination are Bmk1000, Match. Bmk1000 is an internal row identifier (used here due to lack of clustered index in this example) and would be different for each row emitted from #Destination (if there was more than one).
The single row is then joined onto the four matching rows in #Source and the resultant four rows are passed into a stream aggregate.
The stream aggregate groups by Bmk1000 and collapses the multiple matching rows down to one. The operation performed by this aggregate is ANY(#Source.[Data]).
The ANY aggregate is an internal aggregate function not available in TSQL itself. No guarantees are made about which of the four source rows will be chosen.
Finally the single row per group feeds into the UPDATE operator to update the row with whatever value the ANY aggregate returned.
If you want deterministic results then you can use an aggregate function yourself...
WITH GroupedSource AS
(
SELECT Match,
MAX(Data) AS Data
FROM #Source
GROUP BY Match
)
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN GroupedSource Source
ON Destination.Match = Source.Match;
Or use ROW_NUMBER...
WITH RankedSource AS
(
SELECT Match,
Data,
ROW_NUMBER() OVER (PARTITION BY Match ORDER BY Data DESC) AS RN
FROM #Source
)
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN RankedSource Source
ON Destination.Match = Source.Match
WHERE RN = 1;
The latter form is generally more useful as in the event you need to set multiple columns this will ensure that all values used are from the same source row. In order to be deterministic the combination of partition by and order by columns should be unique.

Resources