I have couple tables with millions, and in some table billions, of rows, with one column as int now I am changing to bigint. I tried changing datatype using SSMS and it failed after a couple of hours as transaction log full.
Another approach I took is to create a new column and started updating value from old column to new column in batches, by setting ROWCOUNT property to 100000, it works but it very slow and it claims full server memory. With this approach, it may take a couple of days to complete, and it won't be acceptable in production.
What is the fast\best way to change datatype? The source column is not identity column and duplicate, and null is allowed. The table has an index on other columns, shall disabling index will speed up the process? Will adding Begin Tran and Commit help?

I ran a test for the ALTER COLUMN that shows the actual time required to make the change. The results show that the ALTER COLUMN is not instantaneous, and the time required grows linearly.
RecordCt Elapsed Mcs
----------- -----------
10000 184019
100000 1814181
1000000 18410841
My recommendation would be to batch it as you suggested. Create a new column, and pre-populate the column over time using a combination of ROWCOUNT and WAITFOR.
Code your script so that the WAITFOR value is read from a table. That way you can modify the WAITFOR value on-the-fly as your production server starts to bog down. You can shorten the WAITFOR during off-peak hours. (You can even use DMVs to make your WAITFOR value automatic, but this is certainly more complex.)
This is a complex update that will require planning and a lot of babysitting.
Here is the ALTER COLUMN test code.
USE tempdb;
IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.TestTable'))
DROP TABLE dbo.TestTable;
CREATE TABLE dbo.TestTable (
ColTest int NULL,
ColGuid uniqueidentifier DEFAULT NEWSEQUENTIALID()
GO 10000
UPDATE dbo.TestTable SET ColTest = ColID;
DECLARE #t1 time(7) = SYSDATETIME();
DECLARE #t2 time(7);
ALTER TABLE dbo.TestTable ALTER COLUMN ColTest bigint NULL;
MAX(ColID) AS RecordCt,
DATEDIFF(mcs, #t1, #t2) AS [Elapsed Mcs]
FROM dbo.TestTable;

a simple alter table <table> alter column <column> bigint null should take basically no time. there won't be any conversion issues or null checks - i don't see why this wouldn't be relatively instant
if you do it through the GUI, it'll probably try to create a temp table, drop the existing table, and create a new one - definitely don't do that

In SQL Server 2016+, this alter table <table> alter column <column> bigint null statement will be a simple metadata change (instant) if the table is fully compressed.
More info here from #Paul White:
Compression must be enabled:
On all indexes and partitions, including the base heap or clustered index.
Either ROW or PAGE compression.
Indexes and partitions may use a mixture of these compression levels. The important thing is there are no uncompressed indexes or partitions.
Changing from NULL to NOT NULL is not allowed.
The following integer type changes are supported:
smallint to integer or bigint.
integer to bigint.
smallmoney to money (uses integer representation internally).
The following string and binary type changes are supported:
char(n) to char(m) or varchar(m)
nchar(n) to nchar(m) or nvarchar(m)
binary(n) to binary(m) or varbinary(m)
All of the above only for n < m and m != max
Collation changes are not allowed


SQL Server fastest way to change data types on large tables

We need to change the data type of about 10 primary keys in our db from numeric(19,0) to bigint. On the smaller tables a simple update of the datatype works just fine but on the larger tables (60-70 million rows) it takes a considerable amount of time.
What is the fastest way to achieve this, preferably without locking the database.
I've written a script that generates the following (which I believe I got from a different SO post)
--Add a new temporary column to store the changed value.
ALTER TABLE query_log ADD id_bigint bigint NULL;
CREATE NONCLUSTERED INDEX IX_query_log_id_bigint ON query_log (id_bigint)
INCLUDE (id); -- the include only works on SQL 2008 and up
-- This index may help or hurt performance, I'm not sure... :)
declare #count int
declare #iteration int
declare #progress int
set #iteration = 0
set #progress = 0
select #count = COUNT(*) from query_log
RAISERROR ('Processing %d records', 0, 1, #count) WITH NOWAIT
-- Update the table in batches of 10000 at a time
UPDATE X -- Updating a derived table only works on SQL 2005 and up
SET X.id_bigint = id
SELECT TOP 10000 * FROM query_log WHERE id_bigint IS NULL
) X;
IF ##RowCount = 0 BREAK;
set #iteration = #iteration + 1
set #progress = #iteration * 10000
RAISERROR ('processed %d of %d records', 0, 1, #progress, #count) WITH NOWAIT
--kill the pkey on the old column
ALTER TABLE query_log
DROP CONSTRAINT PK__query_log__53833672
BEGIN TRAN; -- now do as *little* work as possible in this blocking transaction
UPDATE T -- catch any updates that happened after we touched the row
SET T.id_bigint =
WHERE T.id_bigint <>;
-- The lock hints ensure everyone is blocked until we do the switcheroo
EXEC sp_rename '', 'id_numeric';
EXEC sp_rename 'query_log.id_bigint', 'id';
DROP INDEX IX_query_log_id_bigint ON query_log;
ALTER TABLE query_log DROP COLUMN id_numeric;
ALTER TABLE query_log
This works very well for the smaller tables but is extremely slow going for the very large tables.
Note this is in preparation for a migration to Postgres and the EnterpriseDB Migration toolkit doesn't seem to understand the numeric(19,0) datatype
If is not possible to change a primary key without locking. The fastest way with the least impact is to create a new table with the new columns and primary keys without foreign keys and indexes. Then batch insert blocks of data in sequential order relative to their primary key(s). When that is finished, add your indexes, then foreign keys back. Finally, drop or rename the old table and rename your new table to the systems expected table name.
In practice your approach will have to vary based on how many records are inserted, updated, and/or deleted. If you're only inserting then you can perform the initial load, and top of the table just before your swap.
This approach should provide the fastest migration, minimal logs, and very little fragmentation on your table and indexes.
You have to remember that every time you modify a record, the data is being modified, indexes are being modified, and foreign keys are being checked. All within one implicit or explicit transaction. The table and/or row(s) will be locked while all changes are made. Even if your database is set to simple logging, the server will still write all changes to the log files. Updates actually are a delete paired with an insert so it is not possible to prevent fragmentation during any other process.

Alter Column: option to specify conversion function?

I have a column of type float that contains phone numbers - I'm aware that this is bad, so I want to convert the column from float to nvarchar(max), converting the data appropriately so as not to lose data.
The conversion can apparently be handled correctly using the STR function (suggested here), but I'm not sure how to go about changing the column type and performing the conversion without creating a temporary column. I don't want to use a temporary column because we are doing this automatically a bunch of times in future and don't want to encounter performance impact from page splits (suggested here)
In Postgres you can add a "USING" option to your ALTER COLUMN statement that specifies how to convert the existing data. I can't find anything like this for TSQL. Is there a way I can do this in place?
Postgres example:
...ALTER COLUMN <column> TYPE <type> USING <func>(<column>);
Rather than use a temporary column in your table, use a (temporary) column in a temporary table. In short:
Create temp table with PK of your table + column you want to change (in the correct data type, of course)
select data into temp table using your conversion method
Change data type in actual table
Update actual table from temp table values
If the table is large, I'd suggest doing this in batches. Of course, if the table isn't large, worrying about page splits is premature optimization since doing a complete rebuild of the table and its indexes after the conversion would be cheap. Another question is: why nvarchar(max)? The data is phone numbers. Last time I checked, phone numbers were fairly short (certainly less than the 2 Gb that nvarchar(max) can hold) and non-unicode. Do some domain modeling to figure out the appropriate data size and you'll thank me later. Lastly, why would you do this "automatically a bunch of times in future"? Why not have the correct data type and insert the right values?
In sqlSever:
CREATE TABLE dbo.Employee
,FirstName VARCHAR(50) NULL
,MiddleName VARCHAR(50) NULL
,LastName VARCHAR(50) NULL
,DateHired datetime NOT NULL
-- Change the datatype to support 100 characters and make NOT NULL
ALTER TABLE dbo.Employee
-- Change datatype and allow NULLs for DateHired
ALTER TABLE dbo.Employee
-- Set SPARSE columns for Middle Name (sql server 2008 only)
ALTER TABLE dbo.Employee

SQL Server inserts and select taking long time

We have a table with about 20 columns as shown below:
We need to do 1000 records insert and select later also produces about 1000 records.
inserts were tried to be done in 2 ways:
parallel via parallel.For c# loop
sql adapter inserting whole dataset filled with 1000 records.
Inserts in both cases are taking over 30 seconds. We even tried doing this in a fresh clean table. How can this be sped up ?
[Earlier for normal 10 column table we have done 2 million record inserts via parallel.for in about 60 seconds.]
Select (tested from SQL mgmt studio) returning 2000 records is also taking more than 30 seconds, even in a clean table.
Time is variable as per:
mgmt studio was running since many days: 17-30 seconds
closed and reopened - 1st select returns in 1 sec.
- 2nd and consequent selects about 7-10 seconds to retrieve all rows.
Does variable size or upper limit fixed size make lot of difference in columns VARCHAR(SIZE) ?
[disk is good speed one(RAID ? not sure) and dedicated for this database]
Table schema: (No PK)
Index is on varchar(50) , non-unique non-clustered
SELECT statement:
select *
from table
where varchar(50) = 'value1'
and varchar(2) = 'value2'
and smallint = 'value3'
The composition is each unique varchar(50) has 5 unique varchar(2) entries and for each varchar(2) further, 1-3 smallint entries.
Have a look at the SqlBulkCopy class. I did a comparison a while back about high performance loading of data from .NET to SQL Server, comparing SqlBulkCopy vs SqlDataAdapter with the bottom line being, to load 100,000 rows:
SqlDataAdapter: 25.0729s
SqlBulkCopy: 0.8229s
Blogged about it here
In terms of SELECT performance, try an index on the 3 fields being queried on - that will allow an index seek to be performed. At present, with just an index on the VARCHAR(50), it will be doing a scan. As you are doing a SELECT * to return ALL columns, it will then have to go off and lookup the rest of the data from those other columns as they would not be included in the index. This could be expensive, so you should consider NOT doing the SELECT * and only return the columns you actually need (if you don't actually need them all). The ones you do really need, name explicitly in the SELECT and you can then INCLUDE them in the index you created on the 3 fields in the WHERE clause. (see MDSN ref on INCLUDE:
To speed up queries:
don't make a VARCHAR(50) your primary (and thus: clustering) key; use something narrower, and something that is fixed in size. INT IDENTITY works the best
why do you have VARCHAR(8000) in your table?? That poses a lot of pressure on the table - why not just make those VARCHAR(MAX) as well??
analyse your queries and create the proper non-clustered indices on columns that can be indexed

Using a trigger to simulate a second identity column in SQL Server 2005

I have various reasons for needing to implement, in addition to the identity column PK, a second, concurrency safe, auto-incrementing column in a SQL Server 2005 database. Being able to have more than one identity column would be ideal, but I'm looking at using a trigger to simulate this as close as possible to the metal.
I believe I have to use a serializable isolation level transaction in the trigger. Do I go about this like Ii would use such a transaction in a normal SQL query?
It is a non-negotiable requirement that the business meaning of the second incrementing column remain separated from the behind the scenes meaning of the first, PK, incrementing column.
To put things as simply as I can, if I create JobCards '0001', '0002', and '0003', then delete JobCards '0002' and '0003', the next Jobcard I create must have ID '0002', not '0004'.
Just an idea, if you have 2 "identity" columns, then surely they would be 'in sync' - if not exactly the same value, then would differ by a constant value. If so, then why not add the "second identity" column as a COMPUTED column, which offsets the primary identity? Or is my logic flawed here?
Edit : As per Martin's comment, note that your calc might need to be N * id + C, where N is the Increment and C the offset / delta - excuse my rusty maths.
For example:
ALTER TABLE MyTable ADD OtherIdentity AS Id * 2 + 1;
Note that for Sql 2012 and later, that you can now use an independent sequence to create two or more independently incrementing columns in the same table.
Note: OP has edited the original requirement to include reclaiming sequences (noting that identity columns in SQL do not reclaim used ID's once deleted).
I would disallow all the deletes from this table altogether. Instead of deleting, I would mark rows as available or inactive. Instead of inserting, I would first search if there are inactive rows, and reuse the one with the smallest ID if they exist. I would insert only if there are no available rows already in the table.
Of course, I would serialize all inserts and deletes with sp_getapplock.
You can use a trigger to disallow all deletes, it is simpler than filling gaps.
A solution to this issue from "Inside Microsoft SQL Server 2008: T-SQL Querying" is to create another table with a single row that holds the current max value.
CREATE TABLE dbo.Sequence(
val int
Then to allocate a range of sufficient size for your insert
CREATE PROC dbo.GetSequence
#val AS int OUTPUT,
#n as int =1
UPDATE dbo.Sequence
SET #val = val = val + #n;
SET #val = #val - #n + 1;
This will block other concurrent attempts to increment the sequence until the first transaction commits.
For a non blocking solution that doesn't handle multi row inserts see my answer here.
This is probably a terrible idea, but it works in at least a limited use scenario
Just use a regular identity and reseed on deletes.
create table reseedtest (
a int identity(1,1) not null,
name varchar(100)
insert reseedtest values('erik'),('john'),('selina')
select * from reseedtest
CREATE TRIGGER TR_reseedtest_D ON reseedtest FOR DELETE
DECLARE #a int
--anyone know another way to lock a table besides doing something to it?
DBCC CHECKIDENT(reseedtest, reseed, 0)
DBCC CHECKIDENT(reseedtest, reseed)
delete reseedtest where a >= 2
insert reseedtest values('katarina'),('david')
select * from reseedtest
drop table reseedtest
This won't work if you are deleting from the "middle of the stack" as it were, but it works fine for deletes from the incrementing end.
Reseeding once to 0 then again is just a trick to avoid having to calculate the correct reseed value.
if you never delete from the table, you could create a view with a materialized column that uses ROW_NUMBER().
ALSO, a SQL Server identity can get out of sync with a user generated one, depending on the use of rollback.

How do you add a NOT NULL Column to a large table in SQL Server?

To add a NOT NULL Column to a table with many records, a DEFAULT constraint needs to be applied. This constraint causes the entire ALTER TABLE command to take a long time to run if the table is very large. This is because:
The DEFAULT constraint modifies existing records. This means that the db needs to increase the size of each record, which causes it to shift records on full data-pages to other data-pages and that takes time.
The DEFAULT update executes as an atomic transaction. This means that the transaction log will need to be grown so that a roll-back can be executed if necessary.
The transaction log keeps track of the entire record. Therefore, even though only a single field is modified, the space needed by the log will be based on the size of the entire record multiplied by the # of existing records. This means that adding a column to a table with small records will be faster than adding a column to a table with large records even if the total # of records are the same for both tables.
Possible solutions:
Suck it up and wait for the process to complete. Just make sure to set the timeout period to be very long. The problem with this is that it may take hours or days to do depending on the # of records.
Add the column but allow NULL. Afterward, run an UPDATE query to set the DEFAULT value for existing rows. Do not do UPDATE *. Update batches of records at a time or you'll end up with the same problem as solution #1. The problem with this approach is that you end up with a column that allows NULL when you know that this is an unnecessary option. I believe that there are some best practice documents out there that says that you should not have columns that allow NULL unless it's necessary.
Create a new table with the same schema. Add the column to that schema. Transfer the data over from the original table. Drop the original table and rename the new table. I'm not certain how this is any better than #1.
Are my assumptions correct?
Are these my only solutions? If so, which one is the best? I f not, what else could I do?
I ran into this problem for my work also. And my solution is along #2.
Here are my steps (I am using SQL Server 2005):
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn varchar(40) DEFAULT('')
2) Add a NOT NULL constraint with the NOCHECK option. The NOCHECK does not enforce on existing values:
3) Update the values incrementally in table:
UPDATE TOP(3000) MyTable SET MyColumn = '' WHERE MyColumn IS NULL
GO 1000
The update statement will only update maximum 3000 records. This allow to save a chunk of data at the time. I have to use "MyColumn IS NULL" because my table does not have a sequence primary key.
GO 1000 will execute the previous statement 1000 times. This will update 3 million records, if you need more just increase this number. It will continue to execute until SQL Server returns 0 records for the UPDATE statement.
Here's what I would try:
Do a full backup of the database.
Add the new column, allowing nulls - don't set a default.
Set SIMPLE recovery, which truncates the tran log as soon as each batch is committed.
Run the update in batches as you discussed above, committing after each one.
Reset the new column to no longer allow nulls.
Go back to the normal FULL recovery.
Backup the database again.
The use of the SIMPLE recovery model doesn't stop logging, but it significantly reduces its impact. This is because the server discards the recovery information after every commit.
You could:
Start a transaction.
Grab a write lock on your original table so no one writes to it.
Create a shadow table with the new schema.
Transfer all the data from the original table.
execute sp_rename to rename the old table out.
execute sp_rename to rename the new table in.
Finally, you commit the transaction.
The advantage of this approach is that your readers will be able to access the table during the long process and that you can perform any kind of schema change in the background.
Just to update this with the latest information.
In SQL Server 2012 this can now be carried out as an online operation in the following circumstances
Enterprise Edition only
The default must be a runtime constant
For the second requirement examples might be a literal constant or a function such as GETDATE() that evaluates to the same value for all rows. A default of NEWID() would not qualify and would still end up updating all rows there and then.
For defaults that qualify SQL Server evaluates them and stores the result as the default value in the column metadata so this is independent of the default constraint which is created (which can even be dropped if no longer required). This is viewable in sys.system_internals_partition_columns. The value doesn't get written out to the rows until next time they happen to get updated.
More details about this here: online non-null with values column add in sql server 2012
Admitted that this is an old question. My colleague recently told me that he was able to do it in one single alter table statement on a table with 13.6M rows. It finished within a second in SQL Server 2012. I was able to confirm the same on a table with 8M rows. Something changed in later version of SQL Server?
Alter table mytable add mycolumn char(1) not null default('N');
I think this depends on the SQL flavor you are using, but what if you took option 2, but at the very end alter table table to not null with the default value?
Would it be fast, since it sees all the values are not null?
If you want the column in the same table, you'll just have to do it. Now, option 3 is potentially the best for this because you can still have the database "live" while this operation is going on. If you use option 1, the table is locked while the operation happens and then you're really stuck.
If you don't really care if the column is in the table, then I suppose a segmented approach is the next best. Though, I really try to avoid that (to the point that I don't do it) because then like Charles Bretana says, you'll have to make sure and find all the places that update/insert that table and modify those. Ugh!
I had a similar problem, and went for your option #2.
It takes 20 minutes this way, as opposed to 32 hours the other way!!! Huge difference, thanks for the tip.
I wrote a full blog entry about it, but here's the important sql:
Alter table MyTable
Add MyNewColumn char(10) null default '?';
update MyTable set MyNewColumn='?' where MyPrimaryKey between 0 and 1000000
update MyTable set MyNewColumn='?' where MyPrimaryKey between 1000000 and 2000000
update MyTable set MyNewColumn='?' where MyPrimaryKey between 2000000 and 3000000
Alter table MyTable
Alter column MyNewColumn char(10) not null;
And the blog entry if you're interested:
I had a similar problem and I went with modified #3 approach. In my case the database was in SIMPLE recovery mode and the table to which column was supposed to be added was not referenced by any FK constraints.
Instead of creating a new table with the same schema and copying contents of original table, I used SELECT…INTO syntax.
According to Microsoft (
The amount of logging for SELECT...INTO depends on the recovery model
in effect for the database. Under the simple recovery model or
bulk-logged recovery model, bulk operations are minimally logged. With
minimal logging, using the SELECT… INTO statement can be more
efficient than creating a table and then populating the table with an
INSERT statement. For more information, see Operations That Can Be
Minimally Logged.
The sequence of steps :
1.Move data from old table to new while adding new column with default
SELECT table.*, cast (‘default’ as nvarchar(256)) new_column
INTO table_copy
FROM table
2.Drop old table
3.Rename newly created table
EXEC sp_rename 'table_copy', ‘table’
4.Create necessary constraints and indexes on the new table
In my case the table had more than 100 million rows and this approach completed faster than approach #2 and log space growth was minimal.
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn int default 0
2) Update the values incrementally in the table (same effect as accepted answer). Adjust the number of records being updated to your environment, to avoid blocking other users/processes.
declare #rowcount int = 1
while (#rowcount > 0)
UPDATE TOP(10000) MyTable SET MyColumn = 0 WHERE MyColumn IS NULL
set #rowcount = ##ROWCOUNT
3) Alter the column definition to require not null. Run the following at a moment when the table is not in use (or schedule a few minutes of downtime). I have successfully used this for tables with millions of records.
I would use CURSOR instead of UPDATE. Cursor will update all matching records in batch, record by record -- it takes time but not locks table.
If you want to avoid locks use WAIT.
Also I am not sure, that DEFAULT constrain changes existing rows.
Probably NOT NULL constrain use together with DEFAULT causes case described by author.
If it changes add it in the end
So pseudocode will look like:
-- without NOT NULL constrain -- we will add it in the end
ALTER TABLE table ADD new_column INT DEFAULT 0
new_column IS NULL
OPEN fillNullColumn
#key INT
FETCH NEXT FROM fillNullColumn INTO #key
new_column = 0 -- default value
key = #key
WAIT 00:00:05 --wait 5 seconds, keep in mind it causes updating only 12 rows per minute
FETCH NEXT FROM fillNullColumn INTO #key
CLOSE fillNullColumn
DEALLOCATE fillNullColumn
I am sure that there are some syntax errors, but I hope that this
help to solve your problem.
Good luck!
Vertically segment the table. This means you will have two tables, with the same primary key, and exactly the same number of records... One will be the one you already have, the other will have just the key, and the new Non-Null column (with default value) .
Modify all Insert, Update, and delete code so they keep the two tables in synch... If you want you can create a view that "joins" the two tables together to create a single logical combination of the two that appears like a single table for client Select statements...
