alter column size take very long time? - sql-server

I have a table with 45M rows (45 GB data space and 2GB Index space). I added a new column and it finished instantly.
alter table T add C char(25)
Then I found the size is too small so I run the following query.
alter table T alter column C varchar(2500)
And it runs one hour and is still running. sp_whoisactive shows (at the moment, still running)
reads: 48,000,000
writes: 5,000,000
physical reads: 3,900,000
Shouldn't it be really fast?

I tested the case. You can do it faster using below steps:
Create the same table structure with a different name (call it Tbl2)
Alter the column on Tbl2
insert data from Tbl1 into Tbl2
Drop Tbl1 (the old table)
Rename Tbl2 (the new one) to Tbl1
This will give you much better performance.
The reason is, altering the column on table containing data, will take a lot of data transfer and data page alignment.
Using my solution you just insert data w/o any page reorganization.
If a post answers your question, please mark is as answer


Does ALTER TABLE ALTER COLUMN interrupt ongoing db access?

I have a column in a table so that it is no longer NVARCHAR(256) but is NVARCHAR(MAX). I know the command to do this (ALTER TABLE ALTER COLUMN NVARCHAR(MAX)). My quesiton is really about disruption. I have to do this on a production environment and I was wondering if while I carry this out on the live environment there is a chance that there may be some disruption to usage to users. Will users who are using the database at the time be booted off? Will this operation likely take too long?
I've deleted my previous answer which claimed that this would be a metadata only change and am submitting a new one with an entirely different conclusion!
Whilst this is true for changing to up to nvarchar(4000) for the case of changing to nvarchar(max) the operation does seem extremely expensive. SQL Server will add a new variable length column and copy the previously existing data which will likely mean a time consuming blocking operation resulting in many page splits and both internal and logical fragmentation.
This can be seen from the below
Foo int IDENTITY(1,1) primary key,
FROM sys.objects
Then looking at the page in SQL Server Internals Viewer shows
The white 41 00 ... is wasted space from the previous version of the column.
Any ongoing queries will not be affected. The database has to wait until it can make an exclusive table lock before it can be altered.
While the update is done, no queries can use the table, so if there is a lot of records in the table, the database will seem unresponsive to any queries that would need to use the table.
The advice has to be - make a backup and do it out of hours if you can.
That having been said, I would not expect your database to be disrupted by the change and it will not take very long to do it.
What about your client software ? How will that be affected ?
It should be fine, unless you have a massive amount of rows (millions).. Yes, it will lock the table while it's updating but pending requests will just wait on it.


What is the difference between using
INSERT INTO MyTable (...)
SELECT ... FROM ....
From BOL [ INSERT, SELECT...INTO ], I know that using SELECT...INTO will create the insertion table on the default file group if it doesn't already exist, and that the logging for this statement depends on the recovery model of the database.
Which statement is preferable?
Are there other performance implications?
What is a good use case for SELECT...INTO over INSERT INTO ...?
Edit: I already stated that I know that that SELECT INTO... creates a table where it doesn't exist. What I want to know is that SQL includes this statement for a reason, what is it? Is it doing something different behind the scenes for inserting rows, or is it just syntactic sugar on top of a CREATE TABLE and INSERT INTO.
They do different things. Use INSERT when the table exists. Use SELECT INTO when it does not.
Yes. INSERT with no table hints is normally logged. SELECT INTO is minimally logged assuming proper trace flags are set.
In my experience SELECT INTO is most commonly used with intermediate data sets, like #temp tables, or to copy out an entire table like for a backup. INSERT INTO is used when you insert into an existing table with a known structure.
To address your edit, they do different things. If you are making a table and want to define the structure use CREATE TABLE and INSERT. Example of an issue that can be created: You have a small table with a varchar field. The largest string in your table now is 12 bytes. Your real data set will need up to 200 bytes. If you do SELECT INTO from your small table to make a new one, the later INSERT will fail with a truncation error because your fields are too small.
Which statement is preferable? Depends on what you are doing.
Are there other performance implications? If the table is a permanent table, you can create indexes at the time of table creation which has implications for performance both negatively and positiviely. Select into does not recreate indexes that exist on current tables and thus subsequent use of the table may be slower than it needs to be.
What is a good use case for SELECT...INTO over INSERT INTO ...? Select into is used if you may not know the table structure in advance. It is faster to write than create table and an insert statement, so it is used to speed up develoment at times. It is often faster to use when you are creating a quick temp table to test things or a backup table of a specific query (maybe records you are going to delete). It should be rare to see it used in production code that will run multiple times (except for temp tables) because it will fail if the table was already in existence.
It is sometimes used inappropriately by people who don't know what they are doing. And they can cause havoc in the db as a result. I strongly feel it is inappropriate to use SELECT INTO for anything other than a throwaway table (a temporary backup, a temp table that will go away at the end of the stored proc ,etc.). Permanent tables need real thought as to their design and SELECT INTO makes it easy to avoid thinking about anything even as basic as what columns and what datatypes.
In general, I prefer the use of the create table and insert statement - you have more controls and it is better for repeatable processes. Further, if the table is a permanent table, it should be created from a separate create table script (one that is in source control) as creating permanent objects should not, in general, in code are inserts/deletes/updates or selects from a table. Object changes should be handled separately from data changes because objects have implications beyond the needs of a specific insert/update/select/delete. You need to consider the best data types, think about FK constraints, PKs and other constraints, consider auditing requirements, think about indexing, etc.
Each statement has a distinct use case. They are not interchangeable.
SELECT...INTO MyTable... creates a new MyTable where one did not exist before.
INSERT INTO MyTable...SELECT... is used when MyTable already exists.
The primary difference is that SELECT INTO MyTable will create a new table called MyTable with the results, while INSERT INTO requires that MyTable already exists.
You would use SELECT INTO only in the case where the table didn't exist and you wanted to create it based on the results of your query. As such, these two statements really are not comparable. They do very different things.
In general, SELECT INTO is used more often for one off tasks, while INSERT INTO is used regularly to add rows to tables.
While you can use CREATE TABLE and INSERT INTO to accomplish what SELECT INTO does, with SELECT INTO you do not have to know the table definition beforehand. SELECT INTO is probably included in SQL because it makes tasks like ad hoc reporting or copying tables much easier.
Actually SELECT ... INTO not only creates the table but will fail if it already exists, so basically the only time you would use it is when the table you are inserting to does not exists.
In regards to your EDIT:
I personally mainly use SELECT ... INTO when I am creating a temp table. That to me is the main use. However I also use it when creating new tables with many columns with similar structures to other tables and then edit it in order to save time.
I only want to cover second point of the question that is related to performance, because no body else has covered this. Select Into is a lot more faster than insert into, when it comes to tables with large datasets. I prefer select into when I have to read a very large table. insert into for a table with 10 million rows may take hours while select into will do this in minutes, and as for as losing indexes on new table is concerned you can recreate the indexes by query and can still save a lot more time when compared to insert into.
SELECT INTO is typically used to generate temp tables or to copy another table (data and/or structure).
In day to day code you use INSERT because your tables should already exist to be read, UPDATEd, DELETEd, JOINed etc. Note: the INTO keyword is optional with INSERT
That is, applications won't normally create and drop tables as part of normal operations unless it is a temporary table for some scope limited and specific usage.
A table created by SELECT INTO will have no keys or indexes or constraints unlike a real, persisted, already existing table
The 2 aren't directly comparable because they have almost no overlap in usage
Select into creates new table for you at the time and then insert records in it from the source table. The newly created table has the same structure as of the source table.If you try to use select into for a existing table it will produce a error, because it will try to create new table with the same name.
Insert into requires the table to be exist in your database before you insert rows in it.
The simple difference between select Into and Insert Into is:
--> Select Into don't need existing table. If you want to copy table A data, you just type Select * INTO [tablename] from A. Here, tablename can be existing table or new table will be created which has same structure like table A.
--> Insert Into do need existing table.INSERT INTO [tablename] SELECT * FROM A;.
Here tablename is an existing table.
Select Into is usually more popular to copy data especially backup data.
You can use as per your requirement, it is totally developer choice which should be used in his scenario.
Performance wise Insert INTO is fast.
References :
The other answers are all great/correct (the main difference is whether the DestTable exists already (INSERT), or doesn't exist yet (SELECT ... INTO))
You may prefer to use INSERT (instead of SELECT ... INTO), if you want to be able to COUNT(*) the rows that have been inserted so far.
Using SELECT COUNT(*) ... WITH NOLOCK is a simple/crude technique that may help you check the "progress" of the INSERT; helpful if it's a long-running insert, as seen in this answer).
[If you use...]
INSERT DestTable SELECT ... FROM SrcTable
...then your SELECT COUNT(*) from DestTable WITH (NOLOCK) query would work.
Select into for large datasets may be good only for a single user using one single connection to the database doing a bulk operation task. I do not recommend to use
as this creates one big transaction and creates schema lock to create the object, preventing other users to create object or access system objects until the SELECT INTO operation completes.
As proof of concept open 2 sessions, in first session try to use
select into temp table from a huge table
and in the second section try to
create a temp table
and check the locks, blocking and the duration of second session to create a temp table object. My recommendation it is always a good practice to create and Insert statement and if needed for minimal logging use trace flag 610.

Converting int primary key to bigint in Sql Server

We have a production table with 770 million rows and change. We want(/need?) to change the Primary ID column from int to bigint to allow for future growth (and to avoid the sudden stop when the 32bit integer space is exhausted)
Experiments in DEV have shown that this is not as simple as altering the column as we would need to drop the index and then re-create it. So far in DEV (which is a bit humbler than PROD) the dropping of the index has not finished after 1 and a half hours. This table is hit 24/7 and having it offline for such a long time is not an option.
Has anyone else had to deal with a similar situation? How did you get it done?
Are there alternatives?
Edit: Additional Info:
The Primary key is clustered.
You could attempt a staged approach.
Create a new bigint column
Create an insert trigger to keep new entries in sync with the 2 columns
Execute an update to populate all the empty values in the bigint column with the converted value
Change the primary index on the table from your old id column to the new one
Point any FK's and queries to use the new column
Change the new column to become your identity column and remove the insert trigger from #2
Delete the old ID column
You should end up spreading the pain out over these 7 steps instead of hitting it all at once.
Create a parallel table with the longer data type for new rows and UNION the results?
What I had to do was copy the data into a new table with the desired structure (primary/clustered key only, non-clustered/FK once complete). If you don't have the room, you could bcp out the data and back in. You may need an application outage to make this happen.
What doesn't work: alter table Orderhistory alter column ID bigint because of the primary key. Don't drop the key and alter column as you will just fill your log file and take much longer than copy/bcp.
Never use the SSMS tools designer to change a column property, it copies table into temp table then does a rename once done. Lookup the alter table alter column syntax and use it and possibly defrag once complete if you modified a column wider that sits in middle of table.

In Oracle, is it possible to "insert" a column into a table?

When adding a column to an existing table, Oracle always puts the column at the end of the table. Is it possible to tell Oracle where it should appear in the table? If so, how?
The location of the column in the table should be unimportant (unless there are "page sizes" to consider, or whatever Oracle uses to actually store the data). What is more important to the consumer is how the results are called, i.e. the Select statement.
create table YOUR_ORIGINAL_TABLE nologging /* or unrecoverable */
select Column1, Column2, NEW_COLUMN, Column3
Drop table YOUR_NEW_TABLE;
Select * From YOUR_ORIGINAL_TABLE; <<<<< now you will see the new column in the middle of the table.
But why would you want to do it? It's seems illogical. You should never assume column ordering and just use named column list if column order is important.
Why does the order of the columns matter? You can always alter it in your select statement?
There's an advantage to adding new columns at the end of the table. If there's code that naively does a "SELECT *" and then parses the fields in order, you won't be breaking old code by adding new columns at the end. If you add new columns in the middle of the table, then old code may be broken.
At one job, I had a DBA who was super-anal about "Never do 'SELECT *'". He insisted that you always write out the specific fields.
What I normally do is:
Rename the old table.
Create the new table with columns in the right order.
Create the constraints for that new table.
Populate with data:Insert into new_table select * from renamed table.
I don't think that this can be done without saving the data to a temporary table, dropping the table, and recreating it. On the other hand, it really shouldn't matter where the column is. As long as you specify the columns you are retrieving in your select statement, you can order them however you want.
Bear in mind that, under the tables, all the data in the table records are glued together. Adding a column to the end of a table [if it is nullable or (in later versions) not null with a default] just means a change to the table's metadata.
Adding a column in the middle would require re-writing every record in that table to add the appropriate value (or markers) for that column. In some cases, that might mean the records take up more room on the blocks and some records need to be migrated.
In short, it's a VAST amount of IO effort for a table of any real size.
You can always create a view over the table that has the columns in the preferred order and use that view in a DML statement just as you would the table
I don't believe so - SQL Server doesn't allow these either. The method I always have to use is:
Create new table that looks right (including additional column
Begin transaction
select all data from old table into new one
Drop old table
Rename new table
Commit transaction.
Not exactly pretty, but gets the job done.
No, its not possible via an "ALTER TABLE" statement. However, you could create a new table with the same definition as your current one, albeit with a different name, with the columns in the correct order in the way you want them. Copy the data into the new table. Drop the old table. Rename the new table to match the old table name.
Tom Kyte has an article on this on AskTom
link text
Apparently there's a trick involving marking the "after" columns INVISIBLE; when restored, they end up at the back.
CREATE TABLE yourtable (one NUMBER(5, 0), two NUMBER(5, 0), three NUMBER(5, 0), four NUMBER(5, 0))
ALTER TABLE yourtable ADD twopointfive NUMBER(5, 0);
1) Ok so you can't do it directly. We don't need post after post saying the same thing, do we?
2) Ok so the order of columns in a table doesn't technically matter. But that's not the point, the original question simply asked if you could or couldn't be done. Don't presume that you know everybody else's requirements. Maybe they have a table with 100 columns that is currently being queried using "SELECT * ..." inside some monstrously hacked together query that they would just prefer not to try to untangle, let alone replace "*" with 100 column names. Or maybe they are just anal about the order of things and like to have related fields next to each other when browsing schema with, say SQL Developer. Maybe they are dealing with non-technical staff that won't know to look at the end of a list of 100 columns when, logically, it should be somewhere near the beginning.
Nothing is more irritating than asking an honest question and getting an answer that says: "you shouldn't be doing that". It's MY job, not YOURS! Please don't tell me how to do my job. Just help if you can. Thanks!
Ok... sorry for the rant. it suggests this workaround.
First suppose you have already run:
CREATE TABLE tab1 ( col1 NUMBER );
Now say you want to add a column named "col2", but you want them ordered "col2", "col1" when doing a "SELECT * FROM tbl1;"
The suggestion is to run:
RENAME tab1 TO tab1_old;
CREATE TABLE tab1 AS SELECT 0 AS col1, col1 AS col2 FROM tab1_old;
I found this to be incredibly misleading. First of all, you're filling "col1" with zero's so, if you had any data, then you are losing it by doing this. Secondly, it's actually renaming "col1" to "col2" and fails to mention this. So, here's my example, hopefully it's a little clearer:
Suppose you have a table that was created with the following statement:
CREATE TABLE users (first_name varchar(25), last_name varchar(25));
Now say you want to insert middle_name in between first_name and last_name. Here's one way:
ALTER TABLE users ADD middle_name varchar(25);
RENAME users TO users_tmp;
CREATE TABLE users AS SELECT first_name, middle_name, last_name FROM users_tmp;
/* and for good measure... */
DROP TABLE testusers_tmp;
Note that middle_name will default to NULL (implied by the ALTER TABLE statement). You can alternatively set a different default value in the CREATE TABLE statement like so:
CREATE TABLE users AS SELECT first_name, 'some default value' AS middle_name, last_name FROM users_tmp;
This trick could come in handy if you're adding a date field with a default of sysdate, but you want all of the existing records to have some other (e.g. earlier) date value.

How do you add a NOT NULL Column to a large table in SQL Server?

To add a NOT NULL Column to a table with many records, a DEFAULT constraint needs to be applied. This constraint causes the entire ALTER TABLE command to take a long time to run if the table is very large. This is because:
The DEFAULT constraint modifies existing records. This means that the db needs to increase the size of each record, which causes it to shift records on full data-pages to other data-pages and that takes time.
The DEFAULT update executes as an atomic transaction. This means that the transaction log will need to be grown so that a roll-back can be executed if necessary.
The transaction log keeps track of the entire record. Therefore, even though only a single field is modified, the space needed by the log will be based on the size of the entire record multiplied by the # of existing records. This means that adding a column to a table with small records will be faster than adding a column to a table with large records even if the total # of records are the same for both tables.
Possible solutions:
Suck it up and wait for the process to complete. Just make sure to set the timeout period to be very long. The problem with this is that it may take hours or days to do depending on the # of records.
Add the column but allow NULL. Afterward, run an UPDATE query to set the DEFAULT value for existing rows. Do not do UPDATE *. Update batches of records at a time or you'll end up with the same problem as solution #1. The problem with this approach is that you end up with a column that allows NULL when you know that this is an unnecessary option. I believe that there are some best practice documents out there that says that you should not have columns that allow NULL unless it's necessary.
Create a new table with the same schema. Add the column to that schema. Transfer the data over from the original table. Drop the original table and rename the new table. I'm not certain how this is any better than #1.
Are my assumptions correct?
Are these my only solutions? If so, which one is the best? I f not, what else could I do?
I ran into this problem for my work also. And my solution is along #2.
Here are my steps (I am using SQL Server 2005):
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn varchar(40) DEFAULT('')
2) Add a NOT NULL constraint with the NOCHECK option. The NOCHECK does not enforce on existing values:
3) Update the values incrementally in table:
UPDATE TOP(3000) MyTable SET MyColumn = '' WHERE MyColumn IS NULL
GO 1000
The update statement will only update maximum 3000 records. This allow to save a chunk of data at the time. I have to use "MyColumn IS NULL" because my table does not have a sequence primary key.
GO 1000 will execute the previous statement 1000 times. This will update 3 million records, if you need more just increase this number. It will continue to execute until SQL Server returns 0 records for the UPDATE statement.
Here's what I would try:
Do a full backup of the database.
Add the new column, allowing nulls - don't set a default.
Set SIMPLE recovery, which truncates the tran log as soon as each batch is committed.
Run the update in batches as you discussed above, committing after each one.
Reset the new column to no longer allow nulls.
Go back to the normal FULL recovery.
Backup the database again.
The use of the SIMPLE recovery model doesn't stop logging, but it significantly reduces its impact. This is because the server discards the recovery information after every commit.
You could:
Start a transaction.
Grab a write lock on your original table so no one writes to it.
Create a shadow table with the new schema.
Transfer all the data from the original table.
execute sp_rename to rename the old table out.
execute sp_rename to rename the new table in.
Finally, you commit the transaction.
The advantage of this approach is that your readers will be able to access the table during the long process and that you can perform any kind of schema change in the background.
Just to update this with the latest information.
In SQL Server 2012 this can now be carried out as an online operation in the following circumstances
Enterprise Edition only
The default must be a runtime constant
For the second requirement examples might be a literal constant or a function such as GETDATE() that evaluates to the same value for all rows. A default of NEWID() would not qualify and would still end up updating all rows there and then.
For defaults that qualify SQL Server evaluates them and stores the result as the default value in the column metadata so this is independent of the default constraint which is created (which can even be dropped if no longer required). This is viewable in sys.system_internals_partition_columns. The value doesn't get written out to the rows until next time they happen to get updated.
More details about this here: online non-null with values column add in sql server 2012
Admitted that this is an old question. My colleague recently told me that he was able to do it in one single alter table statement on a table with 13.6M rows. It finished within a second in SQL Server 2012. I was able to confirm the same on a table with 8M rows. Something changed in later version of SQL Server?
Alter table mytable add mycolumn char(1) not null default('N');
I think this depends on the SQL flavor you are using, but what if you took option 2, but at the very end alter table table to not null with the default value?
Would it be fast, since it sees all the values are not null?
If you want the column in the same table, you'll just have to do it. Now, option 3 is potentially the best for this because you can still have the database "live" while this operation is going on. If you use option 1, the table is locked while the operation happens and then you're really stuck.
If you don't really care if the column is in the table, then I suppose a segmented approach is the next best. Though, I really try to avoid that (to the point that I don't do it) because then like Charles Bretana says, you'll have to make sure and find all the places that update/insert that table and modify those. Ugh!
I had a similar problem, and went for your option #2.
It takes 20 minutes this way, as opposed to 32 hours the other way!!! Huge difference, thanks for the tip.
I wrote a full blog entry about it, but here's the important sql:
Alter table MyTable
Add MyNewColumn char(10) null default '?';
update MyTable set MyNewColumn='?' where MyPrimaryKey between 0 and 1000000
update MyTable set MyNewColumn='?' where MyPrimaryKey between 1000000 and 2000000
update MyTable set MyNewColumn='?' where MyPrimaryKey between 2000000 and 3000000
Alter table MyTable
Alter column MyNewColumn char(10) not null;
And the blog entry if you're interested:
I had a similar problem and I went with modified #3 approach. In my case the database was in SIMPLE recovery mode and the table to which column was supposed to be added was not referenced by any FK constraints.
Instead of creating a new table with the same schema and copying contents of original table, I used SELECT…INTO syntax.
According to Microsoft (
The amount of logging for SELECT...INTO depends on the recovery model
in effect for the database. Under the simple recovery model or
bulk-logged recovery model, bulk operations are minimally logged. With
minimal logging, using the SELECT… INTO statement can be more
efficient than creating a table and then populating the table with an
INSERT statement. For more information, see Operations That Can Be
Minimally Logged.
The sequence of steps :
1.Move data from old table to new while adding new column with default
SELECT table.*, cast (‘default’ as nvarchar(256)) new_column
INTO table_copy
FROM table
2.Drop old table
3.Rename newly created table
EXEC sp_rename 'table_copy', ‘table’
4.Create necessary constraints and indexes on the new table
In my case the table had more than 100 million rows and this approach completed faster than approach #2 and log space growth was minimal.
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn int default 0
2) Update the values incrementally in the table (same effect as accepted answer). Adjust the number of records being updated to your environment, to avoid blocking other users/processes.
declare #rowcount int = 1
while (#rowcount > 0)
UPDATE TOP(10000) MyTable SET MyColumn = 0 WHERE MyColumn IS NULL
set #rowcount = ##ROWCOUNT
3) Alter the column definition to require not null. Run the following at a moment when the table is not in use (or schedule a few minutes of downtime). I have successfully used this for tables with millions of records.
I would use CURSOR instead of UPDATE. Cursor will update all matching records in batch, record by record -- it takes time but not locks table.
If you want to avoid locks use WAIT.
Also I am not sure, that DEFAULT constrain changes existing rows.
Probably NOT NULL constrain use together with DEFAULT causes case described by author.
If it changes add it in the end
So pseudocode will look like:
-- without NOT NULL constrain -- we will add it in the end
ALTER TABLE table ADD new_column INT DEFAULT 0
new_column IS NULL
OPEN fillNullColumn
#key INT
FETCH NEXT FROM fillNullColumn INTO #key
new_column = 0 -- default value
key = #key
WAIT 00:00:05 --wait 5 seconds, keep in mind it causes updating only 12 rows per minute
FETCH NEXT FROM fillNullColumn INTO #key
CLOSE fillNullColumn
DEALLOCATE fillNullColumn
I am sure that there are some syntax errors, but I hope that this
help to solve your problem.
Good luck!
Vertically segment the table. This means you will have two tables, with the same primary key, and exactly the same number of records... One will be the one you already have, the other will have just the key, and the new Non-Null column (with default value) .
Modify all Insert, Update, and delete code so they keep the two tables in synch... If you want you can create a view that "joins" the two tables together to create a single logical combination of the two that appears like a single table for client Select statements...
