SQL Server dependent Identity - is there such a thing? - sql-server

I use SQL Server 2008 R2.
I'm looking for a feature that I describe as dependent identity.
I'll explain by an example.
consider a table like this one:
script
CREATE TABLE [dbo].[Rooms](
[RoomID] [int] NOT NULL,
[ItemID] [int] NOT NULL,
[ItemDescription] [nvarchar] (250))
GO
data:
RoomID ItemID ItemDescription
------ ------ ---------------
7 1 Door
7 2 Window (West)
7 3 Window (North)
8 1 Door
8 2 Table #1
8 3 Table #2
7 4 Table #1
8 4 Chair #1
7 5 Table #2
7 6 Table #3
8 5 Chair #2
(can anyone tell the secret how to format an example table here?)
I would have love to be able to declare a dependent identity column like this:
ItemID [int] Identity(RoomID,1,1) NOT NULL
A new row in [Rooms] should triggers a test for the max value of ItemID where RoomID = #roomID and add 1.
Instead of update with a change in RoomID use delete and insert the required data.
Nowadays I do that programmatically like this:
DECLARE #roomID INT
SET #roomID = 7
INSERT INTO [Allocation].[dbo].[Rooms]
([RoomID], [ItemID], [ItemDescription]) VALUES (#roomID,
(SELECT max([ItemID])+1 FROM [Allocation].[dbo].[Rooms] WHERE [RoomID]=#roomID)
,'Chair #1')
GO
So, Is there such a feature?
In the probable case there is none, could I program the server to set next dependent identity for me automatically, given a specific table, parent column and dependent identity column?

You can use a trigger, and an index to improve performance and ensure there are no duplicates.
Change your table to have a primary key, and allow null for ItemID
CREATE TABLE [dbo].[Rooms](
[RoomID] [int] NOT NULL,
[ItemID] [int] NULL,
[ItemDescription] [nvarchar](250) NULL,
[Id] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_Rooms] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
)
and then add a trigger
CREATE TRIGGER RoomTrigger
ON Rooms
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
update Rooms
set
ItemID = (select coalesce(MAX(itemid), 0) + 1
from Rooms r where r.RoomID = inserted.RoomID )
from
inserted where Rooms.Id = inserted.Id
END
Then you can do this
insert into Rooms (RoomID, ItemDescription) values (1, 'Test')
insert into Rooms (RoomID, ItemDescription) values (1, 'Test')
which results in
RoomID ItemID ItemDescription Id
2 0 Test 1
2 1 Test 2
As suggested by marc_s I've used SQL Query Stress with 10 threads to see what happens with this trigger under load. I didn't get any duplicates at all (using the default isolation level), but I did get loads of deadlocks as I would have expected.
Using the original query from the question I get a lot of duplicates.
Using the trigger approach I get deadlocks and results like this:
RoomID ItemID ItemDescription Id
1 6 Test 6
1 7 Test 9
1 8 Test 902
1 9 Test 903
Here ItemID is contiguous, but about 900 out of 1000 rows failed to be inserted leaving large gaps in Id.
If we add the following index:
CREATE UNIQUE NONCLUSTERED INDEX [IX_Rooms] ON [dbo].[Rooms]
(
[RoomID] ASC,
[ItemID] ASC
)
in order to guarantee no duplicates, and improve the performance of calculating Max(ItemId) for a particular RoomID, then now:
the original query from the question causes duplicates and only manages to insert 500 rows.
the trigger version using the default isolation level succeeds without any deadlocks or errors and runs very fast.
Using the trigger with isolation level = serializable brings back deadlocks so only 40% of the inserts succeed (but no exceptions due to duplicates).
As a final test tried with trigger + 50 threads + isolation level = default. No errors.

Related

Another way to insert with an Id, but with out updating the Identity seed

I have two tables, an old and a new (the old will be replaced by the new), both tables will and are being used.
We will migrate items over but some processes still use the old table and cannot be swapped over right away, so we want to create a "dummy item" in the old table using the new tables data, so we can utilize some old processes.
We want to insert the dummy items above at 500000 but keep the seed of the OLD table id below the 500000
Test table
CREATE TABLE [dbo].[OLD]
(
[id] [int] IDENTITY(1,1) NOT NULL,
[RowNumber] [int] NOT NULL
)
INSERT INTO [dbo].[OLD] ([RowNumber])
VALUES (1), (2)
SET IDENTITY_INSERT OLD ON
INSERT INTO OLD (id, [RowNumber])
VALUES (500000, 500000)
SET IDENTITY_INSERT OLD OFF
-- -- Uncomment for the reseeding
-- DECLARE #Reseed AS INT
-- SET #Reseed = (SELECT MAX(ID) FROM OLD WHERE ID < 500000)
-- DBCC CHECKIDENT('OLD', RESEED,#Reseed)
INSERT INTO [dbo].[OLD] ([RowNumber])
VALUES (3)
SELECT *
FROM old
DROP TABLE [dbo].[OLD]
Current data
Id
RowNumber
1
1
2
2
So inserting, with out IDENTITY_INSERT, RowNumber 500003 after the 500000 IDENTITY_INSERT and have the Id automagically be the old seed.
Looking for:
Id
RowNumber
1
1
2
2
500000
500000
3
500003
I looked into reseeding but feel it could be quite dangerous, and the NOT FOR REPLICATION seems to be only for the SQL Server replication. Is there another way that doesn't feel as dangerous?

How to improve the performance of a query on SQL Server with a table with 150 million records?

How can I improve my select query in a table with more 150 million records in SQL Server? I need to run a simple select and retrieve the result in the minimum time as possible. Should I create some index? Table partition? What do you guys recommend for that?
Here is my current scenario:
Table:
CREATE TABLE [dbo].[table_name]
(
[id] [BIGINT] IDENTITY NOT NULL,
[key] [VARCHAR](20) NOT NULL,
[text_value] [TEXT] NOT NULL,
CONSTRAINT [PK_table_name]
PRIMARY KEY CLUSTERED ([id] ASC)
)
GO
Select:
SELECT TOP 1
text_value
FROM
table_name (NOLOCK)
WHERE
key = #Key
Additional info:
That table won't have updates or deletes
The column text_value has a Json that will be retrieved on the Select and an application will handle this info
No other queries will run on that table, just the query above to retrieve always the text_value based on key column
Every 2 or 3 months about 15 millions are added to the table
For that query:
SELECT top 1 text_value FROM table_name (NOLOCK) where key = #Key
I would add the following index:
CREATE INDEX idx ON table_name (key)
INCLUDE (text_value);
The lookup will always be on the key column so that will form the index structure, and you want to include the text_value but not have it in the non-leaf pages. This should always result in an index seek without a key lookup (a covering index).
Also, do not use the TEXT data type as it will be removed in a future version, use VARCHAR(MAX) instead. Ref: https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql?view=sql-server-2017

SQL Server - Order Identity Fields in Table

I have a table with this structure:
CREATE TABLE [dbo].[cl](
[ID] [int] IDENTITY(1,1) NOT NULL,
[NIF] [numeric](9, 0) NOT NULL,
[Name] [varchar](80) NOT NULL,
[Address] [varchar](100) NULL,
[City] [varchar](40) NULL,
[State] [varchar](30) NULL,
[Country] [varchar](25) NULL,
Primary Key([ID],[NIF])
);
Imagine that this table has 3 records. Record 1, 2, 3...
When ever I delete Record number 2 the IDENTITY Field generates a Gap. The table then has Record 1 and Record 3. Its not correct!
Even if I use:
DBCC CHECKIDENT('cl', RESEED, 0)
It does not solve my problem becuase it will set the ID of the next inserted record to 1. And that's not correct either because the table will then have a multiple ID.
Does anyone has a clue about this?
No database is going to reseed or recalculate an auto-incremented field/identity to use values in between ids as in your example. This is impractical on many levels, but some examples may be:
Integrity - since a re-used id could mean records in other systems are referring to an old value when the new value is saved
Performance - trying to find the lowest gap for each value inserted
In MySQL, this is not really happening either (at least in InnoDB or MyISAM - are you using something different?). In InnoDB, the behavior is identical to SQL Server where the counter is managed outside of the table, so deleted values or rolled back transactions leave gaps between last value and next insert. In MyISAM, the value is calculated at time of insertion instead of managed through an external counter. This calculation is what is giving the perception of being recalcated - it's just never calculated until actually needed (MAX(Id) + 1). Even this won't insert inside gaps (like the id = 2 in your example).
Many people will argue if you need to use these gaps, then there is something that could be improved in your data model. You shouldn't ever need to worry about these gaps.
If you insist on using those gaps, your fastest method would be to log deletes in a separate table, then use an INSTEAD OF INSERT trigger to perform the inserts with your intended keys by first looking for records in these deletions table to re-use (then deleting them to prevent re-use) and then using the MAX(Id) + 1 for any additional rows to insert.
I guess what you want is something like this:
create table dbo.cl
(
SurrogateKey int identity(1, 1)
primary key
not null,
ID int not null,
NIF numeric(9, 0) not null,
Name varchar(80) not null,
Address varchar(100) null,
City varchar(40) null,
State varchar(30) null,
Country varchar(25) null,
unique (ID, NIF)
)
go
I added a surrogate key so you'll have the best of both worlds. Now you just need a trigger on the table to "adjust" the ID whenever some prior ID gets deleted:
create trigger tr_on_cl_for_auto_increment on dbo.cl
after delete, update
as
begin
update dbo.cl
set ID = d.New_ID
from dbo.cl as c
inner join (
select c2.SurrogateKey,
row_number() over (order by c2.SurrogateKey asc) as New_ID
from dbo.cl as c2
) as d
on c.SurrogateKey = d.SurrogateKey
end
go
Of course this solution also implies that you'll have to ensure (whenever you insert a new record) that you check for yourself which ID to insert next.

Composite increment column

I have a situation where I need to have a secondary column be incremented by 1, assuming the value of another is the same.
Table schema:
CREATE TABLE [APP].[World]
(
[UID] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[App_ID] [bigint] NOT NULL,
[id] [bigint] NOT NULL,
[name] [varchar](255) NOT NULL,
[descript] [varchar](max) NULL,
[default_tile] [uniqueidentifier] NOT NULL,
[active] [bit] NOT NULL,
[inactive_date] [datetime] NULL
)
First off, I have UID which is wholly unique, no matter what App_ID is.
In my situation, I would like to have id be similar to Increment(1,1), only for the same App_ID.
Assumptions:
There are 3 App_Id: 1, 2, 3
Scenario:
App_ID 1 has 3 worlds
App_ID 2 has 5 worlds
App_ID 3 has 1 world
Ideal outcome:
App_ID id
1 1
2 1
3 1
1 2
2 2
1 3
2 3
2 4
2 5
Was thinking of placing the increment logic in the Insert stored procedure but wanted to see if there would be an easier or different way of producing the same result without a stored procedure.
Figure the available option(s) are triggers or stored procedure implementation but wanted to make sure there wasn't some edge-case pattern I am missing.
Update #1
Lets rethink this a little.
This is about there being a PK UID and ultimately a Partitioned Column id, over App_ID, that is incremented by 1 with each new entry for the associated App_id.
This would be similar to how you would do Row_Number() but without all the overhead of recalculating the value each time a new entry is inserted.
As well App_ID and id both have the space and potential for being BIGINT; therefore the combination number of possible combinations would be: BIGINT x BIGINT
This is not possible to implement the way you are asking for. As others have pointed out in comments to your original post, your database design would be a lot better of split up in multiple tables, which all have their own identities and utilizes foreign key constraint where necessary.
However, if you are dead set on proceeding with this approach, I would make app_id an identity column and then increment the id column by first querying it for
MAX(identity)
and then increment the response by 1. This kind of logic is suitable to implement in a stored procedure, which you should implement for inserts anyway to prevent from direct sql injections and such. The query part of such a procedure could look like this:
INSERT INTO
[db].dbo.[yourtable]
SET
(
app_id
, id
)
VALUES
(
#app_id
, (
SELECT
MAX(id)
FROM
[db].dbo.[table]
WHERE
App_id = #app_id
)
)
The performance impact for doing so however, is up to you to assess.
Also, you need to consider how to properly handle when there is no previous rows for that app_id.
Simplest Solution will be as below :
/* Adding Leading 0 to [App_ID] */
[SELECT RIGHT(CONCAT('0000', (([App_ID] - (([App_ID] - 1) % 3)) / 3) + 1), 4) AS [App_ID]
I did the similar thing in my recent code, please find the below image.
Hope the below example will help you.
Explanation part - In the below Code, used the MAX(Primary_Key Identity column) and handled first entry case with the help of ISNULL(NULL,1). In All other cases, it will add up 1 and gives unique value. Based on requirements and needs, we can made changes and use the below example code. WHILE Loop is just added to show demo(Not needed actually).
IF OBJECT_ID('dbo.Sample','U') IS NOT NULL
DROP TABLE dbo.Sample
CREATE TABLE [dbo].[Sample](
[Sample_key] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[Student_Key] [int] UNIQUE NOT NULL,
[Notes] [varchar](100) NULL,
[Inserted_dte] [datetime] NOT NULL
)
DECLARE #A INT,#N INT
SET #A=1
SET #N=10
WHILE(#A<=#N)
BEGIN
INSERT INTO [dbo].[Sample]([Student_Key],[Notes],[Inserted_dte])
SELECT ISNULL((MAX([Student_Key])+1),1),'NOTES',GETDATE() FROM [dbo].[Sample]
SET #A+=1
END
SELECT * FROM [dbo].[Sample]

I can I make this query run faster

I have a table that can be simplified to the below:
Create Table Data (
DATAID bigint identity(1,1) NOT NULL,
VALUE1 varchar(200) NOT NULL,
VALUE2 varchar(200) NOT NULL,
CONSTRAINT PK_DATA PRIMARY KEY CLUSTERED (DATAID ASC)
)
Among others, this index exists:
CREATE NONCLUSTERED INDEX VALUEIDX ON dbo.DATA
(VALUE1 ASC) INCLUDE (VALUE2)
The table has about 9 million rows with mostly sparse data in VALUE1 and VALUE2.
The query Select Count(*) from DATA takes about 30 seconds. And the following query takes 1 minute and 30 seconds:
Select Count(*) from DATA Where VALUE1<>VALUE2
Is there any way I can make this faster? I basically need to find (and update) all rows where VALUE1 is different from VALUE2. I considered adding a bit field called ISDIFF and update that via a Trigger whenever the value fields are updated is updated. But then I need to create an index on the bit field and select WHERE ISDIFF=1.
Any help will be appreciated.
PS: Using MS SQL Server 2008

Resources