Can I write a SQL script to update all the rows for a single table in SQL Server, but in batches? - sql-server

I have a large amount of rows in a single table, in our Microsoft SQL Server.
I need to add a new column (DateTime) to the table. Fine. The column needs to be NOT NULL. Ok, fine. Now, this is the problem:
It takes too long to set all column values for all the rows
I tried to create the NULLABLE column. That took a nanosecond. Then UPDATE all the rows by setting the value to GETUTCDATE().
eg.
DECLARE #foo = GETUTCDATE();
UPDATE PewPew
SET NewField = #Foo
After 2 hours, I had to cancel the query. Yeah, 2 hours. I also did NOT do this in a TRANSACTION. I then dropped the new field and we were back to where we started.
I was thinking could we
Create column NULLABLE
In batches of 1000 or something, UPDATE the top 1000 rows where the NewField is NULL.
Once all is done, then alter the column to make it NOT NULLABLE
The SQL Server is on Azure - it's a Standard D13 (8 Cores, 56 GB memory) VM and I think we put it on SSD's, so I'd like to think the hardware isn't too bad.
Footnote: large amount of rows = I think it's about ~20 million. That's sorta large to us, but not large to some. We get that.

select 1;
while(##RowCount<>0)
begin
update top(1000) PewPew set NewField = getutcdate() WHERE NewField IS NULL;
end
This will update 1000 rows at a time.

What if you add the new column as NOT NULL and with a default constraint?
ALTER TABLE dbo.YourTable
ADD NewColumn DATETIME2(3) NOT NULL
CONSTRAINT DF_YourTable_NewColumn DEFAULT (SYSUTCDATETIME())
That should add the column, as NOT NULL, and immediately fill in the default value. Not sure if that'll run any faster than your UPDATE, though - worth a shot.

Related

default date constraint has null records

We are having occasionally EMPTY records in our table/column below when there are multiple records inserted at one shot. While technically this is allowed since the column is nullable, the default constraint should apply for every row inserted.
ALTER TABLE [dbo].[JOB] ADD [DATE_CREATED] [nvarchar](35) NULL CONSTRAINT [DF_JOB_DATE_CREATED] DEFAULT (sysdatetime())
The one possible reason I could think of is "The default will only apply if you don't insert explicitly to that column". But I couldn't find anywhere code does that but I'm still working on that. Any other possible reasons?
We are on SQL Server 2012. The purpose of the column is to capture created date and time for processing. We can't have this column Non-nullable as this is a reporting column which shouldn't have a business impact.
Thank you for your advise.
Make the column NOT NULL. At the very least, do that so you can capture what application/query is explicitly inserting NULLs - which really just shouldn't be allowed.
Short of that, create a trigger:
CREATE TRIGGER trg_JOB_CreateDate
ON dbo.JOB
AFTER INSERT
AS
BEGIN
UPDATE j
SET DateInserted = GETDATE() -- consider using GETUTCDATE()
FROM JOB j
INNER JOIN inserted i
ON i.PrimaryKeyName = JOB.PrimaryKeyName
END
However, this could result in some additional transactional overhead, and won't stop someone from updating the column to = NULL. But again, if having that be null breaks something, then you really should just have the column be NOT NULL.

Changing Datatype from int to bigint for tables containing billions of rows

I have couple tables with millions, and in some table billions, of rows, with one column as int now I am changing to bigint. I tried changing datatype using SSMS and it failed after a couple of hours as transaction log full.
Another approach I took is to create a new column and started updating value from old column to new column in batches, by setting ROWCOUNT property to 100000, it works but it very slow and it claims full server memory. With this approach, it may take a couple of days to complete, and it won't be acceptable in production.
What is the fast\best way to change datatype? The source column is not identity column and duplicate, and null is allowed. The table has an index on other columns, shall disabling index will speed up the process? Will adding Begin Tran and Commit help?
I ran a test for the ALTER COLUMN that shows the actual time required to make the change. The results show that the ALTER COLUMN is not instantaneous, and the time required grows linearly.
RecordCt Elapsed Mcs
----------- -----------
10000 184019
100000 1814181
1000000 18410841
My recommendation would be to batch it as you suggested. Create a new column, and pre-populate the column over time using a combination of ROWCOUNT and WAITFOR.
Code your script so that the WAITFOR value is read from a table. That way you can modify the WAITFOR value on-the-fly as your production server starts to bog down. You can shorten the WAITFOR during off-peak hours. (You can even use DMVs to make your WAITFOR value automatic, but this is certainly more complex.)
This is a complex update that will require planning and a lot of babysitting.
Rob
Here is the ALTER COLUMN test code.
USE tempdb;
SET NOCOUNT ON;
GO
IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.TestTable'))
DROP TABLE dbo.TestTable;
GO
CREATE TABLE dbo.TestTable (
ColID int IDENTITY,
ColTest int NULL,
ColGuid uniqueidentifier DEFAULT NEWSEQUENTIALID()
);
GO
INSERT INTO dbo.TestTable DEFAULT VALUES;
GO 10000
UPDATE dbo.TestTable SET ColTest = ColID;
GO
DECLARE #t1 time(7) = SYSDATETIME();
DECLARE #t2 time(7);
ALTER TABLE dbo.TestTable ALTER COLUMN ColTest bigint NULL;
SET #t2 = SYSDATETIME();
SELECT
MAX(ColID) AS RecordCt,
DATEDIFF(mcs, #t1, #t2) AS [Elapsed Mcs]
FROM dbo.TestTable;
a simple alter table <table> alter column <column> bigint null should take basically no time. there won't be any conversion issues or null checks - i don't see why this wouldn't be relatively instant
if you do it through the GUI, it'll probably try to create a temp table, drop the existing table, and create a new one - definitely don't do that
In SQL Server 2016+, this alter table <table> alter column <column> bigint null statement will be a simple metadata change (instant) if the table is fully compressed.
More info here from #Paul White:
https://sqlperformance.com/2020/04/database-design/new-metadata-column-changes-sql-server-2016
Compression must be enabled:
On all indexes and partitions, including the base heap or clustered index.
Either ROW or PAGE compression.
Indexes and partitions may use a mixture of these compression levels. The important thing is there are no uncompressed indexes or partitions.
Changing from NULL to NOT NULL is not allowed.
The following integer type changes are supported:
smallint to integer or bigint.
integer to bigint.
smallmoney to money (uses integer representation internally).
The following string and binary type changes are supported:
char(n) to char(m) or varchar(m)
nchar(n) to nchar(m) or nvarchar(m)
binary(n) to binary(m) or varbinary(m)
All of the above only for n < m and m != max
Collation changes are not allowed

SQL Server bit column constraint, 1 row = 1, all others 0

I have a bit IsDefault column. Only one row of data within the table may have this bit column set to 1, all the others must be 0.
How can I enforce this?
All versions:
Trigger
Indexed view
Stored proc (eg test on write)
SQL Server 2008: a filtered index
CREATE UNIQUE INDEX IX_foo ON bar (MyBitCol) WHERE MyBitCol = 1
Assuming your PK is a single, numeric column, you could add a computed column to your table:
ALTER TABLE YourTable
ADD IsDefaultCheck AS CASE IsDefault
WHEN 1 THEN -1
WHEN 0 THEN YourPK
END
Then create a unique index on the computed column.
CREATE UNIQUE INDEX IX_DefaultCheck ON YourTable(IsDefaultCheck)
I think the trigger is the best idea if you want to change the old default record to 0 when you insert/update a new one and if you want to make sure one record always has that value (i.e. if you delete the record with the value you would assign it to a different record). You would have to decide on the rules for doing so. These triggers can be tricky because you have to account for multiple records in the inserted and deleted tables. So if 3 records in a batch try to update to become the default record, which one wins?
If you want to make sure the one default record never changes when someone else tries to change it, the filtered index is a good idea.
Different approaches can be taken here, but I think only two are correct. But lets do it step by step.
We have table Hierachy table in which we have Root column. This column tells us what row is currently the starting point. As in question asked, we want to have only one starting point.
We think that we can do it with:
Constraint
Indexed View
Trigger
Different table and relation
Constraint
In this approach first we need to create function which will do the job.
CREATE FUNCTION [gt].[fnOnlyOneRoot]()
RETURNS BIT
BEGIN
DECLARE #rootAmount TINYINT
DECLARE #result BIT
SELECT #rootAmount=COUNT(1) FROM [gt].[Hierarchy] WHERE [Root]=1
IF #rootAmount=1
set #result=1
ELSE
set #result=0
RETURN #result
END
GO
And then the constraint:
ALTER TABLE [gt].[Hierarchy] WITH CHECK ADD CONSTRAINT [ckOnlyOneRoot] CHECK (([gt].[fnOnlyOneRoot]()=(1)))
Unfortunately approach is wrong as this constraint won't allow us to change any values in the table. It need to have exactly one root marked (insert with Root=1 will throw exception, and update with set Root=0 also)
We could change the fnOnyOneRoot to allow having 0 selected roots but it not what we wanted.
Index
Index will remove all rows which are defined in the where clause and on the rest data will setup unique constraint. We have different options here:
- Root can be nullable and we can add in where Root!=0 and Root is not null
- Root must have value and we can add only in where Root!=0
- and different combinations
CREATE UNIQUE INDEX ix_OnyOneRoot ON [gt].[Hierarchy](Root) WHERE Root !=0 and Root is not null
This approach also is not perfect. Maximum one Root will be forced, but minimum not. To update data we need to set previous rows to null or 0.
Trigger
We can do two kinds of trigger both behaves differently
- Prevent trigger - which won't allow us to put wrong data
- DoTheJob trigger - which in background will update data for us
Prevent trigger
This is basically the same as constraint, if we want to force only one root than we cannot update or insert.
CREATE TRIGGER tOnlyOneRoot
ON [gt].[Hierarchy]
AFTER INSERT, UPDATE
AS
DECLARE #rootAmount TINYINT
DECLARE #result BIT
SELECT #rootAmount=COUNT(1) FROM [gt].[Hierarchy] WHERE [Root]=1
IF #rootAmount=1
set #result=1
ELSE
set #result=0
IF #result=0
BEGIN
RAISERROR ('Only one root',0,0);
ROLLBACK TRANSACTION
RETURN
END
GO
DoTheJob trigger
This trigger will check for all inserted/updated rows and if more than one Root will be passed it will throw exception. In other case, so if one new Root will be updated or inserted, trigger will allow to do it and after operation it will change Root value for all other rows to 0.
CREATE TRIGGER tOnlyOneRootDoTheJob
ON [gt].[Hierarchy]
AFTER INSERT, UPDATE
AS
DECLARE #insertedCount TINYINT
SELECT #insertedCount = COUNT(1) FROM inserted WHERE [Root]=1
if (#insertedCount > 1)
BEGIN
RAISERROR ('Only one root',0,0);
ROLLBACK TRANSACTION
RETURN
END
DECLARE #newRootId INT
SELECT #newRootId = [HierarchyId] FROM inserted WHERE [Root]=1
UPDATE [gt].[Hierarchy] SET [Root]=0 WHERE [HierarchyId] <> #newRootId
GO
This is the solution we tried to achieve. Only one root rule is always meet. (Additional trigger for Delete should be done)
Different table and relation
This is lets say more normalized way. We create new table allow only to have one row (using the options described above) and we join.
CREATE TABLE [gt].[HierarchyDefault](
[HierarchyId] INT PRIMARY KEY NOT NULL,
CONSTRAINT FK_HierarchyDefault_Hierarchy FOREIGN KEY (HierarchyId) REFERENCES [gt].[Hierarchy](HierarchyId)
)
Does it will hit the performance?
With one column
SET STATISTICS TIME ON;
SELECT [HierarchyId],[ParentHierarchyId],[Root]
FROM [gt].[Hierarchy] WHERE [root]=1
SET STATISTICS TIME OFF;
Result
CPU time = 0 ms, elapsed time = 0 ms.
With join:
SET STATISTICS TIME ON;
SELECT h.[HierarchyId],[ParentHierarchyId],[Root]
FROM [gt].[Hierarchy] h
INNER JOIN [gt].[HierarchyDefault] hd on h.[HierarchyId]=hd.[HierarchyId]
WHERE [root]=1
SET STATISTICS TIME OFF;
Result
CPU time = 0 ms, elapsed time = 0 ms.
Summary
I will use the trigger. It is some magic in the table, but it did all job under the hood.
Easy table creation:
CREATE TABLE [gt].[Hierarchy](
[HierarchyId] INT PRIMARY KEY IDENTITY(1,1),
[ParentHierarchyId] INT NULL,
[Root] BIT
CONSTRAINT FK_Hierarchy_Hierarchy FOREIGN KEY (ParentHierarchyId)
REFERENCES [gt].[Hierarchy](HierarchyId)
)
You could apply an Instead of Insert trigger and check the value as it's coming in.
Create Trigger TRG_MyTrigger
on MyTable
Instead of Insert
as
Begin
--Check to see if the row is marked as active....
If Exists(Select * from inserted where IsDefault= 1)
Begin
Update Table Set IsDefault=0 where ID= (select ID from inserted);
insert into Table(Columns)
select Columns from inserted
End
End
Alternatively you could apply a unique constraint on the column.
The accepted answer to the below question is both interesting and relevant:
Constraint for only one record marked as default
"But the serious relational folks will tell you this information
should just be in another table."
Have a separate 1 row table that tells you which record is 'default'. Anon touched on this in his comment.
I think this is the best approach - simple, clean & doesn't require a 'clever' esoteric solution prone to errors or later misunderstanding. You can even drop the IsDefualt column.

Using a trigger to simulate a second identity column in SQL Server 2005

I have various reasons for needing to implement, in addition to the identity column PK, a second, concurrency safe, auto-incrementing column in a SQL Server 2005 database. Being able to have more than one identity column would be ideal, but I'm looking at using a trigger to simulate this as close as possible to the metal.
I believe I have to use a serializable isolation level transaction in the trigger. Do I go about this like Ii would use such a transaction in a normal SQL query?
It is a non-negotiable requirement that the business meaning of the second incrementing column remain separated from the behind the scenes meaning of the first, PK, incrementing column.
To put things as simply as I can, if I create JobCards '0001', '0002', and '0003', then delete JobCards '0002' and '0003', the next Jobcard I create must have ID '0002', not '0004'.
Just an idea, if you have 2 "identity" columns, then surely they would be 'in sync' - if not exactly the same value, then would differ by a constant value. If so, then why not add the "second identity" column as a COMPUTED column, which offsets the primary identity? Or is my logic flawed here?
Edit : As per Martin's comment, note that your calc might need to be N * id + C, where N is the Increment and C the offset / delta - excuse my rusty maths.
For example:
ALTER TABLE MyTable ADD OtherIdentity AS Id * 2 + 1;
Edit
Note that for Sql 2012 and later, that you can now use an independent sequence to create two or more independently incrementing columns in the same table.
Note: OP has edited the original requirement to include reclaiming sequences (noting that identity columns in SQL do not reclaim used ID's once deleted).
I would disallow all the deletes from this table altogether. Instead of deleting, I would mark rows as available or inactive. Instead of inserting, I would first search if there are inactive rows, and reuse the one with the smallest ID if they exist. I would insert only if there are no available rows already in the table.
Of course, I would serialize all inserts and deletes with sp_getapplock.
You can use a trigger to disallow all deletes, it is simpler than filling gaps.
A solution to this issue from "Inside Microsoft SQL Server 2008: T-SQL Querying" is to create another table with a single row that holds the current max value.
CREATE TABLE dbo.Sequence(
val int
)
Then to allocate a range of sufficient size for your insert
CREATE PROC dbo.GetSequence
#val AS int OUTPUT,
#n as int =1
AS
UPDATE dbo.Sequence
SET #val = val = val + #n;
SET #val = #val - #n + 1;
This will block other concurrent attempts to increment the sequence until the first transaction commits.
For a non blocking solution that doesn't handle multi row inserts see my answer here.
This is probably a terrible idea, but it works in at least a limited use scenario
Just use a regular identity and reseed on deletes.
create table reseedtest (
a int identity(1,1) not null,
name varchar(100)
)
insert reseedtest values('erik'),('john'),('selina')
select * from reseedtest
go
CREATE TRIGGER TR_reseedtest_D ON reseedtest FOR DELETE
AS
BEGIN TRAN
DECLARE #a int
SET #a = (SELECT TOP 1 a FROM reseedtest WITH (TABLOCKX, HOLDLOCK))
--anyone know another way to lock a table besides doing something to it?
DBCC CHECKIDENT(reseedtest, reseed, 0)
DBCC CHECKIDENT(reseedtest, reseed)
COMMIT TRAN
GO
delete reseedtest where a >= 2
insert reseedtest values('katarina'),('david')
select * from reseedtest
drop table reseedtest
This won't work if you are deleting from the "middle of the stack" as it were, but it works fine for deletes from the incrementing end.
Reseeding once to 0 then again is just a trick to avoid having to calculate the correct reseed value.
if you never delete from the table, you could create a view with a materialized column that uses ROW_NUMBER().
ALSO, a SQL Server identity can get out of sync with a user generated one, depending on the use of rollback.

How do you add a NOT NULL Column to a large table in SQL Server?

To add a NOT NULL Column to a table with many records, a DEFAULT constraint needs to be applied. This constraint causes the entire ALTER TABLE command to take a long time to run if the table is very large. This is because:
Assumptions:
The DEFAULT constraint modifies existing records. This means that the db needs to increase the size of each record, which causes it to shift records on full data-pages to other data-pages and that takes time.
The DEFAULT update executes as an atomic transaction. This means that the transaction log will need to be grown so that a roll-back can be executed if necessary.
The transaction log keeps track of the entire record. Therefore, even though only a single field is modified, the space needed by the log will be based on the size of the entire record multiplied by the # of existing records. This means that adding a column to a table with small records will be faster than adding a column to a table with large records even if the total # of records are the same for both tables.
Possible solutions:
Suck it up and wait for the process to complete. Just make sure to set the timeout period to be very long. The problem with this is that it may take hours or days to do depending on the # of records.
Add the column but allow NULL. Afterward, run an UPDATE query to set the DEFAULT value for existing rows. Do not do UPDATE *. Update batches of records at a time or you'll end up with the same problem as solution #1. The problem with this approach is that you end up with a column that allows NULL when you know that this is an unnecessary option. I believe that there are some best practice documents out there that says that you should not have columns that allow NULL unless it's necessary.
Create a new table with the same schema. Add the column to that schema. Transfer the data over from the original table. Drop the original table and rename the new table. I'm not certain how this is any better than #1.
Questions:
Are my assumptions correct?
Are these my only solutions? If so, which one is the best? I f not, what else could I do?
I ran into this problem for my work also. And my solution is along #2.
Here are my steps (I am using SQL Server 2005):
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn varchar(40) DEFAULT('')
2) Add a NOT NULL constraint with the NOCHECK option. The NOCHECK does not enforce on existing values:
ALTER TABLE MyTable WITH NOCHECK
ADD CONSTRAINT MyColumn_NOTNULL CHECK (MyColumn IS NOT NULL)
3) Update the values incrementally in table:
GO
UPDATE TOP(3000) MyTable SET MyColumn = '' WHERE MyColumn IS NULL
GO 1000
The update statement will only update maximum 3000 records. This allow to save a chunk of data at the time. I have to use "MyColumn IS NULL" because my table does not have a sequence primary key.
GO 1000 will execute the previous statement 1000 times. This will update 3 million records, if you need more just increase this number. It will continue to execute until SQL Server returns 0 records for the UPDATE statement.
Here's what I would try:
Do a full backup of the database.
Add the new column, allowing nulls - don't set a default.
Set SIMPLE recovery, which truncates the tran log as soon as each batch is committed.
The SQL is: ALTER DATABASE XXX SET RECOVERY SIMPLE
Run the update in batches as you discussed above, committing after each one.
Reset the new column to no longer allow nulls.
Go back to the normal FULL recovery.
The SQL is: ALTER DATABASE XXX SET RECOVERY FULL
Backup the database again.
The use of the SIMPLE recovery model doesn't stop logging, but it significantly reduces its impact. This is because the server discards the recovery information after every commit.
You could:
Start a transaction.
Grab a write lock on your original table so no one writes to it.
Create a shadow table with the new schema.
Transfer all the data from the original table.
execute sp_rename to rename the old table out.
execute sp_rename to rename the new table in.
Finally, you commit the transaction.
The advantage of this approach is that your readers will be able to access the table during the long process and that you can perform any kind of schema change in the background.
Just to update this with the latest information.
In SQL Server 2012 this can now be carried out as an online operation in the following circumstances
Enterprise Edition only
The default must be a runtime constant
For the second requirement examples might be a literal constant or a function such as GETDATE() that evaluates to the same value for all rows. A default of NEWID() would not qualify and would still end up updating all rows there and then.
For defaults that qualify SQL Server evaluates them and stores the result as the default value in the column metadata so this is independent of the default constraint which is created (which can even be dropped if no longer required). This is viewable in sys.system_internals_partition_columns. The value doesn't get written out to the rows until next time they happen to get updated.
More details about this here: online non-null with values column add in sql server 2012
Admitted that this is an old question. My colleague recently told me that he was able to do it in one single alter table statement on a table with 13.6M rows. It finished within a second in SQL Server 2012. I was able to confirm the same on a table with 8M rows. Something changed in later version of SQL Server?
Alter table mytable add mycolumn char(1) not null default('N');
I think this depends on the SQL flavor you are using, but what if you took option 2, but at the very end alter table table to not null with the default value?
Would it be fast, since it sees all the values are not null?
If you want the column in the same table, you'll just have to do it. Now, option 3 is potentially the best for this because you can still have the database "live" while this operation is going on. If you use option 1, the table is locked while the operation happens and then you're really stuck.
If you don't really care if the column is in the table, then I suppose a segmented approach is the next best. Though, I really try to avoid that (to the point that I don't do it) because then like Charles Bretana says, you'll have to make sure and find all the places that update/insert that table and modify those. Ugh!
I had a similar problem, and went for your option #2.
It takes 20 minutes this way, as opposed to 32 hours the other way!!! Huge difference, thanks for the tip.
I wrote a full blog entry about it, but here's the important sql:
Alter table MyTable
Add MyNewColumn char(10) null default '?';
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 0 and 1000000
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 1000000 and 2000000
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 2000000 and 3000000
go
..etc..
Alter table MyTable
Alter column MyNewColumn char(10) not null;
And the blog entry if you're interested:
http://splinter.com.au/adding-a-column-to-a-massive-sql-server-table
I had a similar problem and I went with modified #3 approach. In my case the database was in SIMPLE recovery mode and the table to which column was supposed to be added was not referenced by any FK constraints.
Instead of creating a new table with the same schema and copying contents of original table, I used SELECT…INTO syntax.
According to Microsoft (http://technet.microsoft.com/en-us/library/ms188029(v=sql.105).aspx)
The amount of logging for SELECT...INTO depends on the recovery model
in effect for the database. Under the simple recovery model or
bulk-logged recovery model, bulk operations are minimally logged. With
minimal logging, using the SELECT… INTO statement can be more
efficient than creating a table and then populating the table with an
INSERT statement. For more information, see Operations That Can Be
Minimally Logged.
The sequence of steps :
1.Move data from old table to new while adding new column with default
SELECT table.*, cast (‘default’ as nvarchar(256)) new_column
INTO table_copy
FROM table
2.Drop old table
DROP TABLE table
3.Rename newly created table
EXEC sp_rename 'table_copy', ‘table’
4.Create necessary constraints and indexes on the new table
In my case the table had more than 100 million rows and this approach completed faster than approach #2 and log space growth was minimal.
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn int default 0
2) Update the values incrementally in the table (same effect as accepted answer). Adjust the number of records being updated to your environment, to avoid blocking other users/processes.
declare #rowcount int = 1
while (#rowcount > 0)
begin
UPDATE TOP(10000) MyTable SET MyColumn = 0 WHERE MyColumn IS NULL
set #rowcount = ##ROWCOUNT
end
3) Alter the column definition to require not null. Run the following at a moment when the table is not in use (or schedule a few minutes of downtime). I have successfully used this for tables with millions of records.
ALTER TABLE MyTable ALTER COLUMN MyColumn int NOT NULL
I would use CURSOR instead of UPDATE. Cursor will update all matching records in batch, record by record -- it takes time but not locks table.
If you want to avoid locks use WAIT.
Also I am not sure, that DEFAULT constrain changes existing rows.
Probably NOT NULL constrain use together with DEFAULT causes case described by author.
If it changes add it in the end
So pseudocode will look like:
-- without NOT NULL constrain -- we will add it in the end
ALTER TABLE table ADD new_column INT DEFAULT 0
DECLARE fillNullColumn CURSOR LOCAL FAST_FORWARD
SELECT
key
FROM
table WITH (NOLOCK)
WHERE
new_column IS NULL
OPEN fillNullColumn
DECLARE
#key INT
FETCH NEXT FROM fillNullColumn INTO #key
WHILE ##FETCH_STATUS = 0 BEGIN
UPDATE
table WITH (ROWLOCK)
SET
new_column = 0 -- default value
WHERE
key = #key
WAIT 00:00:05 --wait 5 seconds, keep in mind it causes updating only 12 rows per minute
FETCH NEXT FROM fillNullColumn INTO #key
END
CLOSE fillNullColumn
DEALLOCATE fillNullColumn
ALTER TABLE table ALTER COLUMN new_column ADD CONSTRAIN xxx
I am sure that there are some syntax errors, but I hope that this
help to solve your problem.
Good luck!
Vertically segment the table. This means you will have two tables, with the same primary key, and exactly the same number of records... One will be the one you already have, the other will have just the key, and the new Non-Null column (with default value) .
Modify all Insert, Update, and delete code so they keep the two tables in synch... If you want you can create a view that "joins" the two tables together to create a single logical combination of the two that appears like a single table for client Select statements...

Resources