I have a table with 42 million records. 32 million have a null value and I would like to generate a new guid for each one. Should I do this in batches?
Also, going forward, I would like a new guid added to the field on the insert of a new record. What is the best way to do this?
The field is not the primary key, which is an auto-incrementing integer.
Have you tried an insert trigger?
There is a newid() function to generate a guid. Your column should be varchar(32) (or varbinary(16)) or bigger.
You should run the update in batches to avoid filling up the transaction log e.g
set rowcount 10000
while exists (select 1 from mytab where myid is null)
update mytab set myid = newid() where myid is null
set rowcount 0
You don't need a trigger - just bind the function as a column default. There is an example in the Sybase docs.
Related
Im trying to update a table I have the same Data but with different ID's so i would like to set the ID of both communs to the lowest ID register for the results.
UPDATE TABLENAME
SET EXAMPLEID = LOWER(EXAMPLEID)
WHERE
TID = TID
AND
KID = KID
AND
STREET = STREET
I'm getting the following error:
Msg 8102, Level 16, State 1, Line 1 Cannot update identity column
'EXAMPLEID'
Identity Column is generally used with Primary Key column. In your case if ExampleID is your primary key and also identity Column, You cannot have same ExampleID on two different rows.
Primary Key Column is unique for every row
On the other hand If your column is not PK but Identity Column, then SQL Server does not allow you to update Identity Key Column Value.
But there is a dirty workaround alternative for this (Not Recommended)
You can't update an identity column. You may insert new records with an explicit value using IDENTITY_INSERT, but SQL Server won't let you do an update.
If you really need to do this, the only option you have is to copy the full table temporarily and recreate your final table again with the updated values. This is strongly NOT recommended:
Create a copy of your table, with all related objects (indexes, constraints, etc.), but with no rows (only schema objects).
CREATE TABLE TABLENAME_Mirror (
ExampleID INT IDENTITY,
TID VARCHAR(100),
KID VARCHAR(100),
STREET VARCHAR(100))
Set IDENTITY_INSERT ON on this new table and insert the records with the updated values.
SET IDENTITY_INSERT TABLENAME_Mirror ON
INSERT INTO TABLENAME_Mirror (
ExampleID,
TID,
KID,
STREET)
SELECT
/*Updated values*/
FROM
--....
SET IDENTITY_INSERT TABLENAME_Mirror OFF
Drop the original table and rename the copied one to the original name:
BEGIN TRANSACTION
IF OBJECT_ID('dbo.TABLENAME') is not null
DROP TABLE dbo.TABLENAME
EXEC sys.sp_rename
'dbo.TABLENAME_Mirror',
'TABLENAME'
COMMIT
You might need to reseed the identity with a proper value once the rows are inserted, if you want to keep the same seed as before.
In SQL Server, I have created a Table with an ID column that I have made an IDENTITY COLUMN,
EmployeeID int NOT NULL IDENTITY(100,10) PRIMARY KEY
It is my understanding, when I use the IDENTITY feature, it auto increments the EmployeeID. What I don't know/not sure is:
Is that IDENTITY number created, unique?
Does SQL search the entire column in the table to confirm the number created does not already exist?
Can I override that auto increment number manually?
If I did manually override that number, would the number I enter be checked to make sure it is not a duplicate/existing ID number?
Thanks for any help provided.
Is that IDENTITY number created, unique?
Yes, Identity property is unique
Does SQL search the entire column in the table to confirm the number created does not already exist? \
It need not, what this property does is, just incrementing the old value
Can I override that auto increment number manually?
Yes, you can. You have to use SET IDENTITY_INSERT TABLENAME ON
If I did manually override that number, would the number I enter be checked to make sure it is not a duplicate/existing ID number?
No, that won't be taken care by SQL Server, you will have to ensure you have constraints to take care of this
Below is a simple demo to prove that
create table #temp
(
id int identity(1,1)
)
insert into #temp
default values
go 3
select * from #temp--now id column has 3
set identity_insert #temp on
insert into #temp (id)
values(4)
set identity_insert #temp off
select * from #temp--now id column has 4
insert into #temp
default values
go
select * from #temp--now id column has 5,next value from the last highest
Updating info from comments:
Identity column will allow gaps once you reseed them,also you can't update them
I have a large amount of rows in a single table, in our Microsoft SQL Server.
I need to add a new column (DateTime) to the table. Fine. The column needs to be NOT NULL. Ok, fine. Now, this is the problem:
It takes too long to set all column values for all the rows
I tried to create the NULLABLE column. That took a nanosecond. Then UPDATE all the rows by setting the value to GETUTCDATE().
eg.
DECLARE #foo = GETUTCDATE();
UPDATE PewPew
SET NewField = #Foo
After 2 hours, I had to cancel the query. Yeah, 2 hours. I also did NOT do this in a TRANSACTION. I then dropped the new field and we were back to where we started.
I was thinking could we
Create column NULLABLE
In batches of 1000 or something, UPDATE the top 1000 rows where the NewField is NULL.
Once all is done, then alter the column to make it NOT NULLABLE
The SQL Server is on Azure - it's a Standard D13 (8 Cores, 56 GB memory) VM and I think we put it on SSD's, so I'd like to think the hardware isn't too bad.
Footnote: large amount of rows = I think it's about ~20 million. That's sorta large to us, but not large to some. We get that.
select 1;
while(##RowCount<>0)
begin
update top(1000) PewPew set NewField = getutcdate() WHERE NewField IS NULL;
end
This will update 1000 rows at a time.
What if you add the new column as NOT NULL and with a default constraint?
ALTER TABLE dbo.YourTable
ADD NewColumn DATETIME2(3) NOT NULL
CONSTRAINT DF_YourTable_NewColumn DEFAULT (SYSUTCDATETIME())
That should add the column, as NOT NULL, and immediately fill in the default value. Not sure if that'll run any faster than your UPDATE, though - worth a shot.
I've already read the following answers about the impossibility to alter a column into identity once has been created as a regular int.
Adding an identity to an existing column
How to Alter a table for Identity Specification is identity SQL Server
How to alter column to identity(1,1)
But the thing is I have a table which has been migrated to a new one where the ID was not declared as identity from the beginning, because the old table which was created with an ID identity a long time ago has missing rows due to a purge of historical data. So as far as I know, if I add a new column as identity on my new table, it will automatically create the column sequentially and I need to preserve the IDs from the old table as-is because there is already data linked to these previous IDs.
How can I do transform my ID column from the new table as identity but not sequentially, but with the IDs from the old table?
You could try this approach:
Insert rows with old ID with SET IDENTITY_INSERT <new table> ON. This allows you to insert your own ID.
Reseed the Identity, setting it to the highest ID value +1 with DBCC CHECKIDENT ('<new table>', RESEED, <max ID + 1>). This will allow your Identity to increase from the highest ID and forward.
Something like this in code:
-- Disable auto increment
SET IDENTITY_INSERT <new table> ON
-- <INSERT STUFF HERE>
SET IDENTITY_INSERT <new table> OFF
-- Reseed Identity from max ID
DECLARE #maxval Int
SET #maxval = ISNULL(
(
SELECT
MAX(<identity column>) + 1
FROM <new table>
), 0)
DBCC CHECKIDENT ('<new table>', RESEED, #maxval)
EDIT: This approach requires your ID-column to be an Identity, of course.
If you don't have nulls in the field that you want to copy over from your previous version, you could first figure out what the largest ID is by just doing a max(Id) select. Then using SSMS go add your new field and when you set it as identity, just set the SEED value to something higher than what your current max is so you don't have collisions on new inserts.
I have a process where a temp table is used between a source file, CSV and the production table. The temp table has to match the CSV file columns, there is no PK in this data.
To find a set of rows before and after where the Azure Data Factory was failing, I imported over 2,000,000 rows into a temp table. The process stopped in Azure at 1,500,000 rows.
The error was that an integer or string would be truncated.
This line of code added a PK to the temp table and incremented it:
ALTER TABLE ##FLATFILETEMPBDI ADD ROWNUM INT IDENTITY
That would be the simplest solution to get a row number. I was then able to do this query to find the rows just before and after 1,500,000:
SELECT
ROWNUM
, PARTDESCRIPTION
, LEN(PARTDESCRIPTION) AS LENDESCR
, QUANTITY
, ONORDER
, PRICE
, MANUFACTURERPARTNUMBER
FROM ##FLATFILETEMPBDI
WHERE ROWNUM BETWEEN 1499990 AND 1500005
Works perfectly -- was not planning on it to be that easy, was surprised as anyone to see that the ALTER TABLE with IDENTITY worked to do the numbering for me.
I have a bit IsDefault column. Only one row of data within the table may have this bit column set to 1, all the others must be 0.
How can I enforce this?
All versions:
Trigger
Indexed view
Stored proc (eg test on write)
SQL Server 2008: a filtered index
CREATE UNIQUE INDEX IX_foo ON bar (MyBitCol) WHERE MyBitCol = 1
Assuming your PK is a single, numeric column, you could add a computed column to your table:
ALTER TABLE YourTable
ADD IsDefaultCheck AS CASE IsDefault
WHEN 1 THEN -1
WHEN 0 THEN YourPK
END
Then create a unique index on the computed column.
CREATE UNIQUE INDEX IX_DefaultCheck ON YourTable(IsDefaultCheck)
I think the trigger is the best idea if you want to change the old default record to 0 when you insert/update a new one and if you want to make sure one record always has that value (i.e. if you delete the record with the value you would assign it to a different record). You would have to decide on the rules for doing so. These triggers can be tricky because you have to account for multiple records in the inserted and deleted tables. So if 3 records in a batch try to update to become the default record, which one wins?
If you want to make sure the one default record never changes when someone else tries to change it, the filtered index is a good idea.
Different approaches can be taken here, but I think only two are correct. But lets do it step by step.
We have table Hierachy table in which we have Root column. This column tells us what row is currently the starting point. As in question asked, we want to have only one starting point.
We think that we can do it with:
Constraint
Indexed View
Trigger
Different table and relation
Constraint
In this approach first we need to create function which will do the job.
CREATE FUNCTION [gt].[fnOnlyOneRoot]()
RETURNS BIT
BEGIN
DECLARE #rootAmount TINYINT
DECLARE #result BIT
SELECT #rootAmount=COUNT(1) FROM [gt].[Hierarchy] WHERE [Root]=1
IF #rootAmount=1
set #result=1
ELSE
set #result=0
RETURN #result
END
GO
And then the constraint:
ALTER TABLE [gt].[Hierarchy] WITH CHECK ADD CONSTRAINT [ckOnlyOneRoot] CHECK (([gt].[fnOnlyOneRoot]()=(1)))
Unfortunately approach is wrong as this constraint won't allow us to change any values in the table. It need to have exactly one root marked (insert with Root=1 will throw exception, and update with set Root=0 also)
We could change the fnOnyOneRoot to allow having 0 selected roots but it not what we wanted.
Index
Index will remove all rows which are defined in the where clause and on the rest data will setup unique constraint. We have different options here:
- Root can be nullable and we can add in where Root!=0 and Root is not null
- Root must have value and we can add only in where Root!=0
- and different combinations
CREATE UNIQUE INDEX ix_OnyOneRoot ON [gt].[Hierarchy](Root) WHERE Root !=0 and Root is not null
This approach also is not perfect. Maximum one Root will be forced, but minimum not. To update data we need to set previous rows to null or 0.
Trigger
We can do two kinds of trigger both behaves differently
- Prevent trigger - which won't allow us to put wrong data
- DoTheJob trigger - which in background will update data for us
Prevent trigger
This is basically the same as constraint, if we want to force only one root than we cannot update or insert.
CREATE TRIGGER tOnlyOneRoot
ON [gt].[Hierarchy]
AFTER INSERT, UPDATE
AS
DECLARE #rootAmount TINYINT
DECLARE #result BIT
SELECT #rootAmount=COUNT(1) FROM [gt].[Hierarchy] WHERE [Root]=1
IF #rootAmount=1
set #result=1
ELSE
set #result=0
IF #result=0
BEGIN
RAISERROR ('Only one root',0,0);
ROLLBACK TRANSACTION
RETURN
END
GO
DoTheJob trigger
This trigger will check for all inserted/updated rows and if more than one Root will be passed it will throw exception. In other case, so if one new Root will be updated or inserted, trigger will allow to do it and after operation it will change Root value for all other rows to 0.
CREATE TRIGGER tOnlyOneRootDoTheJob
ON [gt].[Hierarchy]
AFTER INSERT, UPDATE
AS
DECLARE #insertedCount TINYINT
SELECT #insertedCount = COUNT(1) FROM inserted WHERE [Root]=1
if (#insertedCount > 1)
BEGIN
RAISERROR ('Only one root',0,0);
ROLLBACK TRANSACTION
RETURN
END
DECLARE #newRootId INT
SELECT #newRootId = [HierarchyId] FROM inserted WHERE [Root]=1
UPDATE [gt].[Hierarchy] SET [Root]=0 WHERE [HierarchyId] <> #newRootId
GO
This is the solution we tried to achieve. Only one root rule is always meet. (Additional trigger for Delete should be done)
Different table and relation
This is lets say more normalized way. We create new table allow only to have one row (using the options described above) and we join.
CREATE TABLE [gt].[HierarchyDefault](
[HierarchyId] INT PRIMARY KEY NOT NULL,
CONSTRAINT FK_HierarchyDefault_Hierarchy FOREIGN KEY (HierarchyId) REFERENCES [gt].[Hierarchy](HierarchyId)
)
Does it will hit the performance?
With one column
SET STATISTICS TIME ON;
SELECT [HierarchyId],[ParentHierarchyId],[Root]
FROM [gt].[Hierarchy] WHERE [root]=1
SET STATISTICS TIME OFF;
Result
CPU time = 0 ms, elapsed time = 0 ms.
With join:
SET STATISTICS TIME ON;
SELECT h.[HierarchyId],[ParentHierarchyId],[Root]
FROM [gt].[Hierarchy] h
INNER JOIN [gt].[HierarchyDefault] hd on h.[HierarchyId]=hd.[HierarchyId]
WHERE [root]=1
SET STATISTICS TIME OFF;
Result
CPU time = 0 ms, elapsed time = 0 ms.
Summary
I will use the trigger. It is some magic in the table, but it did all job under the hood.
Easy table creation:
CREATE TABLE [gt].[Hierarchy](
[HierarchyId] INT PRIMARY KEY IDENTITY(1,1),
[ParentHierarchyId] INT NULL,
[Root] BIT
CONSTRAINT FK_Hierarchy_Hierarchy FOREIGN KEY (ParentHierarchyId)
REFERENCES [gt].[Hierarchy](HierarchyId)
)
You could apply an Instead of Insert trigger and check the value as it's coming in.
Create Trigger TRG_MyTrigger
on MyTable
Instead of Insert
as
Begin
--Check to see if the row is marked as active....
If Exists(Select * from inserted where IsDefault= 1)
Begin
Update Table Set IsDefault=0 where ID= (select ID from inserted);
insert into Table(Columns)
select Columns from inserted
End
End
Alternatively you could apply a unique constraint on the column.
The accepted answer to the below question is both interesting and relevant:
Constraint for only one record marked as default
"But the serious relational folks will tell you this information
should just be in another table."
Have a separate 1 row table that tells you which record is 'default'. Anon touched on this in his comment.
I think this is the best approach - simple, clean & doesn't require a 'clever' esoteric solution prone to errors or later misunderstanding. You can even drop the IsDefualt column.