SQL Server - Order Identity Fields in Table - sql-server

I have a table with this structure:
CREATE TABLE [dbo].[cl](
[ID] [int] IDENTITY(1,1) NOT NULL,
[NIF] [numeric](9, 0) NOT NULL,
[Name] [varchar](80) NOT NULL,
[Address] [varchar](100) NULL,
[City] [varchar](40) NULL,
[State] [varchar](30) NULL,
[Country] [varchar](25) NULL,
Primary Key([ID],[NIF])
);
Imagine that this table has 3 records. Record 1, 2, 3...
When ever I delete Record number 2 the IDENTITY Field generates a Gap. The table then has Record 1 and Record 3. Its not correct!
Even if I use:
DBCC CHECKIDENT('cl', RESEED, 0)
It does not solve my problem becuase it will set the ID of the next inserted record to 1. And that's not correct either because the table will then have a multiple ID.
Does anyone has a clue about this?

No database is going to reseed or recalculate an auto-incremented field/identity to use values in between ids as in your example. This is impractical on many levels, but some examples may be:
Integrity - since a re-used id could mean records in other systems are referring to an old value when the new value is saved
Performance - trying to find the lowest gap for each value inserted
In MySQL, this is not really happening either (at least in InnoDB or MyISAM - are you using something different?). In InnoDB, the behavior is identical to SQL Server where the counter is managed outside of the table, so deleted values or rolled back transactions leave gaps between last value and next insert. In MyISAM, the value is calculated at time of insertion instead of managed through an external counter. This calculation is what is giving the perception of being recalcated - it's just never calculated until actually needed (MAX(Id) + 1). Even this won't insert inside gaps (like the id = 2 in your example).
Many people will argue if you need to use these gaps, then there is something that could be improved in your data model. You shouldn't ever need to worry about these gaps.
If you insist on using those gaps, your fastest method would be to log deletes in a separate table, then use an INSTEAD OF INSERT trigger to perform the inserts with your intended keys by first looking for records in these deletions table to re-use (then deleting them to prevent re-use) and then using the MAX(Id) + 1 for any additional rows to insert.

I guess what you want is something like this:
create table dbo.cl
(
SurrogateKey int identity(1, 1)
primary key
not null,
ID int not null,
NIF numeric(9, 0) not null,
Name varchar(80) not null,
Address varchar(100) null,
City varchar(40) null,
State varchar(30) null,
Country varchar(25) null,
unique (ID, NIF)
)
go
I added a surrogate key so you'll have the best of both worlds. Now you just need a trigger on the table to "adjust" the ID whenever some prior ID gets deleted:
create trigger tr_on_cl_for_auto_increment on dbo.cl
after delete, update
as
begin
update dbo.cl
set ID = d.New_ID
from dbo.cl as c
inner join (
select c2.SurrogateKey,
row_number() over (order by c2.SurrogateKey asc) as New_ID
from dbo.cl as c2
) as d
on c.SurrogateKey = d.SurrogateKey
end
go
Of course this solution also implies that you'll have to ensure (whenever you insert a new record) that you check for yourself which ID to insert next.

Related

Dynamic SQL to execute large number of rows from a table

I have a table with a very large number of rows which I wish to execute via dynamic SQL. They are basically existence checks and insert statements and I want to migrate data from one production database to another - we are merging transactional data. I am trying to find the optimal way to execute the rows.
I've been finding the coalesce method for appending all the rows to one another to not be efficient for this particularly when the number of rows executed at a time is greater than ~100.
Assume the structure of the source table is something arbitrary like this:
CREATE TABLE [dbo].[MyTable]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[DataField1] [int] NOT NULL,
[FK_ID1] [int] NOT NULL,
[LotsMoreFields] [NVARCHAR] (MAX),
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED ([ID] ASC)
)
CREATE TABLE [dbo].[FK1]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [int] NOT NULL, -- Unique constrained value
CONSTRAINT [PK_FK1] PRIMARY KEY CLUSTERED ([ID] ASC)
)
The other requirement is I am tracking the source table PK vs the target PK and whether an insert occurred or whether I have already migrated that row to the target. To do this, I'm tracking migrated rows in another table like so:
CREATE TABLE [dbo].[ChangeTracking]
(
[ReferenceID] BIGINT IDENTITY(1,1),
[Src_ID] BIGINT,
[Dest_ID] BIGINT,
[TableName] NVARCHAR(255),
CONSTRAINT [PK_ChangeTracking] PRIMARY KEY CLUSTERED ([ReferenceID] ASC)
)
My existing method is executing some dynamic sql generated by a stored procedure. The stored proc does PK lookups as the source system has different PK values for table [dbo].[FK1].
E.g.
IF NOT EXISTS (<ignore this existence check for now>)
BEGIN
INSERT INTO [Dest].[dbo].[MyTable] ([DataField1],[FK_ID1],[LotsMoreFields]) VALUES (333,(SELECT [ID] FROM [Dest].[dbo].[FK1] WHERE [Name]=N'ValueFoundInSource'),N'LotsMoreValues');
INSERT INTO [Dest].[dbo].[ChangeTracking] ([Src_ID],[Dest_ID],[TableName]) VALUES (666,SCOPE_IDENTITY(),N'MyTable'); --666 is the PK in [Src].[dbo].[MyTable] for this inserted row
END
So when you have a million of these, it isn't quick.
Is there a recommended performant way of doing this?
As mentioned, the MERGE statement works well when you're looking at a complex JOIN condition (if any of these fields are different, update the record to match). You can also look into creating a HASHBYTES hash of the entire record to quickly find differences between source and target tables, though that can also be time-consuming on very large data sets.
It sounds like you're making these updates like a front-end developer, by checking each row for a match and then doing the insert. It will be far more efficient to do the inserts with a single query. Below is an example that looks for names that are in the tblNewClient table, but not in the tblClient table:
INSERT INTO tblClient
( [Name] ,
TypeID ,
ParentID
)
SELECT nc.[Name] ,
nc.TypeID ,
nc.ParentID
FROM tblNewClient nc
LEFT JOIN tblClient cl
ON nc.[Name] = cl.[Name]
WHERE cl.ID IS NULL;
This is will way more efficient than doing it RBAR (row by agonizing row).
Taking the two answers from #RusselFox and putting them together, I reached this tentative solution (but looking a LOT more efficient):
MERGE INTO [Dest].[dbo].[MyTable] [MT_D]
USING (SELECT [MT_S].[ID] as [SrcID],[MT_S].[DataField1],[FK_1_D].[ID] as [FK_ID1],[MT_S].[LotsMoreFields]
FROM [Src].[dbo].[MyTable] [MT_S]
JOIN [Src].[dbo].[FK_1] ON [MT_S].[FK_ID1] = [FK_1].[ID]
JOIN [Dest].[dbo].[FK_1] [FK_1_D] ON [FK_1].[Name] = [FK_1_D].[Name]
) [SRC] ON 1 = 0
WHEN NOT MATCHED THEN
INSERT([DataField1],[FL_ID1],[LotsMoreFields])
VALUES ([DataField1],[FL_ID1],[LotsMoreFields])
OUTPUT [SRC].[SrcID],INSERTED.[ID],0,N'MyTable' INTO [Dest].[dbo].[ChangeTracking]([Src_ID],[Dest_ID],[AlreadyExists],[TableName]);

Composite increment column

I have a situation where I need to have a secondary column be incremented by 1, assuming the value of another is the same.
Table schema:
CREATE TABLE [APP].[World]
(
[UID] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[App_ID] [bigint] NOT NULL,
[id] [bigint] NOT NULL,
[name] [varchar](255) NOT NULL,
[descript] [varchar](max) NULL,
[default_tile] [uniqueidentifier] NOT NULL,
[active] [bit] NOT NULL,
[inactive_date] [datetime] NULL
)
First off, I have UID which is wholly unique, no matter what App_ID is.
In my situation, I would like to have id be similar to Increment(1,1), only for the same App_ID.
Assumptions:
There are 3 App_Id: 1, 2, 3
Scenario:
App_ID 1 has 3 worlds
App_ID 2 has 5 worlds
App_ID 3 has 1 world
Ideal outcome:
App_ID id
1 1
2 1
3 1
1 2
2 2
1 3
2 3
2 4
2 5
Was thinking of placing the increment logic in the Insert stored procedure but wanted to see if there would be an easier or different way of producing the same result without a stored procedure.
Figure the available option(s) are triggers or stored procedure implementation but wanted to make sure there wasn't some edge-case pattern I am missing.
Update #1
Lets rethink this a little.
This is about there being a PK UID and ultimately a Partitioned Column id, over App_ID, that is incremented by 1 with each new entry for the associated App_id.
This would be similar to how you would do Row_Number() but without all the overhead of recalculating the value each time a new entry is inserted.
As well App_ID and id both have the space and potential for being BIGINT; therefore the combination number of possible combinations would be: BIGINT x BIGINT
This is not possible to implement the way you are asking for. As others have pointed out in comments to your original post, your database design would be a lot better of split up in multiple tables, which all have their own identities and utilizes foreign key constraint where necessary.
However, if you are dead set on proceeding with this approach, I would make app_id an identity column and then increment the id column by first querying it for
MAX(identity)
and then increment the response by 1. This kind of logic is suitable to implement in a stored procedure, which you should implement for inserts anyway to prevent from direct sql injections and such. The query part of such a procedure could look like this:
INSERT INTO
[db].dbo.[yourtable]
SET
(
app_id
, id
)
VALUES
(
#app_id
, (
SELECT
MAX(id)
FROM
[db].dbo.[table]
WHERE
App_id = #app_id
)
)
The performance impact for doing so however, is up to you to assess.
Also, you need to consider how to properly handle when there is no previous rows for that app_id.
Simplest Solution will be as below :
/* Adding Leading 0 to [App_ID] */
[SELECT RIGHT(CONCAT('0000', (([App_ID] - (([App_ID] - 1) % 3)) / 3) + 1), 4) AS [App_ID]
I did the similar thing in my recent code, please find the below image.
Hope the below example will help you.
Explanation part - In the below Code, used the MAX(Primary_Key Identity column) and handled first entry case with the help of ISNULL(NULL,1). In All other cases, it will add up 1 and gives unique value. Based on requirements and needs, we can made changes and use the below example code. WHILE Loop is just added to show demo(Not needed actually).
IF OBJECT_ID('dbo.Sample','U') IS NOT NULL
DROP TABLE dbo.Sample
CREATE TABLE [dbo].[Sample](
[Sample_key] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[Student_Key] [int] UNIQUE NOT NULL,
[Notes] [varchar](100) NULL,
[Inserted_dte] [datetime] NOT NULL
)
DECLARE #A INT,#N INT
SET #A=1
SET #N=10
WHILE(#A<=#N)
BEGIN
INSERT INTO [dbo].[Sample]([Student_Key],[Notes],[Inserted_dte])
SELECT ISNULL((MAX([Student_Key])+1),1),'NOTES',GETDATE() FROM [dbo].[Sample]
SET #A+=1
END
SELECT * FROM [dbo].[Sample]

Computed Column that doesn't auto update

I have a computed column that is automatically creating a confirmation number by adding the current max ID to some Prefix. It works, but not exactly how I need it to work.
This is the function
ALTER FUNCTION [dbo].[SetEPNum](#IdNum INT)
RETURNS VARCHAR(255)
AS
BEGIN
return (select 'SomePrefix' + RIGHT('00000' + CAST(MAX(IdNum) AS VARCHAR(255)), 5)
FROM dbo.someTable
/*WHERE IdNum = #IdNum*/)
END
If I add WHERE IdNum = #IdNum to the select in the function, that gives the illusion of working, but in reality it is picking the max IdNUM from the one row where IDNum = #IdNum rather than actually picking the current max IDNUM from all IDNums. If I remove the where statement, the computed function simply sets every field to the max Id every time it changes.
This is the table
CREATE TABLE [dbo].[someTable](
[IdNum] [int] IDENTITY(1,1) NOT NULL,
[First_Name] [varchar](50) NOT NULL,
[Last_Name] [varchar](50) NOT NULL,
[EPNum] AS ([dbo].[SetEPNum]([IdNum]))
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
This is the computed column
ALTER TABLE dbo.someTable
ADD EPNum AS dbo.SetEPnum(IdNum)
Is there any way to accomplish this? If not, is there an alternative solution?
If my understanding is correct, you try to get the max id of some table to appear next to each record at the time it was updated?
Right now you get the same max id next to all records.
That is because the max id is one and only one. You have provided no context.
It seems to me this is the job of a trigger or even your update statement. Why employ computed columns? The computed column gets recomputed every time you display the data.
If you absolutely need to go this way, you should employ some other field (e.g. modification date) and get the max id from those records that were updated before the current. It all depends though on the business logic of your application and what you try to achieve.

Sql server query using function and view is slower

I have a table with a xml column named Data:
CREATE TABLE [dbo].[Users](
[UserId] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](max) NOT NULL,
[LastName] [nvarchar](max) NOT NULL,
[Email] [nvarchar](250) NOT NULL,
[Password] [nvarchar](max) NULL,
[UserName] [nvarchar](250) NOT NULL,
[LanguageId] [int] NOT NULL,
[Data] [xml] NULL,
[IsDeleted] [bit] NOT NULL,...
In the Data column there's this xml
<data>
<RRN>...</RRN>
<DateOfBirth>...</DateOfBirth>
<Gender>...</Gender>
</data>
Now, executing this query:
SELECT UserId FROM Users
WHERE data.value('(/data/RRN)[1]', 'nvarchar(max)') = #RRN
after clearing the cache takes (if I execute it a couple of times after each other) 910, 739, 630, 635, ... ms.
Now, a db specialist told me that adding a function, a view and changing the query would make it much more faster to search a user with a given RRN. But, instead, these are the results when I execute with the changes from the db specialist: 2584, 2342, 2322, 2383, ...
This is the added function:
CREATE FUNCTION dbo.fn_Users_RRN(#data xml)
RETURNS nvarchar(100)
WITH SCHEMABINDING
AS
BEGIN
RETURN #data.value('(/data/RRN)[1]', 'varchar(max)');
END;
The added view:
CREATE VIEW vwi_Users
WITH SCHEMABINDING
AS
SELECT UserId, dbo.fn_Users_RRN(Data) AS RRN from dbo.Users
Indexes:
CREATE UNIQUE CLUSTERED INDEX cx_vwi_Users ON vwi_Users(UserId)
CREATE NONCLUSTERED INDEX cx_vwi_Users__RRN ON vwi_Users(RRN)
And then the changed query:
SELECT UserId FROM Users
WHERE dbo.fn_Users_RRN(Data) = #RRN
Why is the solution with a function and a view going slower?
the point of the view was to pre-compute the XML value into a regular column. To then use that precomputed value in the index on the view, shouldn't you actually query the view?
SELECT
UserId
FROM vwi_Users
WHERE RRN= '59021626919-61861855-S_FA1E11'
also, make the index this:
CREATE NONCLUSTERED INDEX cx_vwi_Users__RRN ON vwi_Users(RRN) INCLUDE (UserId)
it is called a covering index, since all columns needed in the query are in the index.
Have you tried to add that function result to your table (not a view) as a persisted, computed column??
ALTER TABLE dbo.Users
ADD dbo.fn_Users_RRN(Data) PERSISTED
Doing so will extract that piece of information from the XML, store it in a computed, always up-to-date column, and the persisted flag makes it physically stored along side the other columns in your table.
If this works (the PERSISTED flag is a bit iffy in terms of all the limitations it has), then you should see nearly the same performance as querying any other string field on your table... and if the computed column is PERSISTED, you can even put an index on it if you feel the need for that.
Check the query execution plan and confirm whether or not the new query is even using the view. If the query doesn't use the view, that's the problem.
How does this query fair?
SELECT UserId FROM vwi_Users
WHERE RRN = '59021626919-61861855-S_FA1E11'
I see you're freely mixing nvarchar and varchar. Don't do that! It can cause full index conversions (eeeeevil).
Scalar functions tend to perform very poorly in SQL Server. I'm not sure why if you make it a persisted computed column and index it, it doesn't have identical performance to a normal indexed-column, but it may be due to the UDF being called even though you think it's no longer needed to be called once the data is computed.
I think you know this from another answer, but your final query is wrongly calling the scalar UDF on every row (defeating the point of persisting the computation):
SELECT UserId FROM Users
WHERE dbo.fn_Users_RRN(Data) = #RRN
It should be
SELECT UserId FROM vwi_Users
WHERE RNN = #RRN

INSTEAD OF UPDATE Trigger and Updating the Primary Key

I am making changes to an existing database while developing new software. There is also quite a lot of legacy software that uses the database that needs to continue working, i.e. I would like to maintain the existing database tables, procs, etc.
Currently I have the table
CREATE TABLE dbo.t_station (
tx_station_id VARCHAR(4) NOT NULL,
tx_description NVARCHAR(max) NOT NULL,
tx_station_type CHAR(1) NOT NULL,
tx_current_order_num VARCHAR(20) NOT NULL,
PRIMARY KEY (tx_station_id)
)
I need to include a new field in this table that refers to a Plant (production facility) and move the tx_current_order_num to another table because it is not required for all rows. So I've created new tables:-
CREATE TABLE Private.Plant (
PlantCode INT NOT NULL,
Description NVARCHAR(max) NOT NULL,
PRIMARY KEY (PlantCode)
)
CREATE TABLE Private.Station (
StationId VARCHAR(4) NOT NULL,
Description NVARCHAR(max) NOT NULL,
StationType CHAR(1) NOT NULL,
PlantCode INT NOT NULL,
PRIMARY KEY (StationId),
FOREIGN KEY (PlantCode) REFERENCES Private.Plant (PlantCode)
)
CREATE TABLE Private.StationOrder (
StationId VARCHAR(4) NOT NULL,
OrderNumber VARCHAR(20) NOT NULL,
PRIMARY KEY (StationId)
)
Now, I don't want to have the same data in two places so I decided to change the dbo.t_station table into a view and provide instead of triggers to do the DELETE, INSERT and UPDATE. No problem I have [most of] them working.
My question regards the INSTEAD OF UPDATE trigger, updating the Primary Key column (tx_station_id) and updates to multiple rows.
Inside the trigger block, is there any way to join the inserted and deleted [psuedo] tables so that I know the 'before update primary key' and the 'after update primary key'? Something like this...
UPDATE sta
SET sta.StationId = ins.tx_station_id
FROM Private.Station AS sta
INNER JOIN deleted AS del
INNER JOIN inserted AS ins
ON ROW_IDENTITY_OF(del) = ROW_IDENTITY_OF(ins)
ON del.tx_station_id = sta.StationId
At this stage I've put a check in the trigger block that rollbacks the update if the primary key column is updated and there is more than one row in the inserted, or deleted, table.
The short answer is no.
You could put a surrogate key on Private.Station, and expose that through the view, and use that to identify before and after values. You wouldn't need to change the primary key or foreign key relationship, but you would have to expose some non-updateable cruft through the view, so that it showed up in the pseudo-tables. eg:
alter table Private.Station add StationSk int identity(1,1) not null
Note, this may break the legacy application if it uses SELECT *. INSERT statements without explicit insert column lists should be ok, though.
Short of that, there may be some undocumented & consistent ordering between INSERTED and DELETED, such that ROW_NUMBER() OVER (ORDER BY NULLIF(StationId,StationId)) would let you join the two, but I'd be very hesitant to take the route. Very, very hesitant.
Have you intentionally not enabled cascade updates? They're useful when primary key values can be updated. eg:
CREATE TABLE Private.Station (
StationId VARCHAR(4) NOT NULL,
Description NVARCHAR(max) NOT NULL,
StationType CHAR(1) NOT NULL,
PlantCode INT NOT NULL,
PRIMARY KEY (StationId),
FOREIGN KEY (PlantCode) REFERENCES Private.Plant (PlantCode)
ON UPDATE CASCADE
-- maybe this too:
-- ON DELETE CASCADE
)
Someone might have a better trick. Wait and watch!

Resources