Disable auto creating statistics on table - sql-server

Is there any possibility to disable auto creating statistics on specific table in database, without disabling auto creating statistics for entire database?
I have a procedure wich written as follow
create proc
as
create table #someTempTable(many columns, more than 100)
inserting into #someTempTable **always one or two row**
exec proc1
exec proc2
etc.
proc1, proc2 .. coontains many selects and updates like this:
select ..
from #someTempTable t
join someOrdinaryTable t2 on ...
update #someTempTable set col1 = somevalue
Profiler shows that before each select server starts collecting stats in #someTempTable, and it takes more than quarter of entire execution of proc. Proc is using in OLPT processing and should works very fast. I want to change this temporary table to table variable(because for table variables server doesn't collect stats) but can't because it lead me to rewrite all this procedures to passing variables between them and all of this legacy code should be retests. I'm searching alternative way how to force server to behave temporary table like table variables in part of collecting stats.
P.S. I'm know that stats is useful thing but in this case it's useless because table alway contains small amount of records.

I assume you know what you are doing. Disabling a statistics is generally a bad idea. Anyhow:
EXEC sp_autostats 'table_name', 'OFF'
More documentation here: https://msdn.microsoft.com/en-us/library/ms188775.aspx.
Edit: OP clarified that he wants to disable statistics for a temp table. Try this:
CREATE TABLE #someTempTable
(
ID int PRIMARY KEY WITH (STATISTICS_NORECOMPUTE = ON),
...other columns...
)
If you don't have a primary key already, use an identity column for a PK.

Related

Lock database table for just a couple of sentences

Suppose a table in SQLServer with this structure:
TABLE t (Id INT PRIMARY KEY)
Then I have a stored procedure, which is constantly being called, that works inserting data in this table among other kind of things:
BEGIN TRAN
DECLARE #Id INT = SELECT MAX(Id) + 1 FROM t
INSERT t VALUES (#Id)
...
-- Stuff that gets a long time to get completed
...
COMMIT
The problem with this aproach is sometimes I get a primary key violation because 2 or more procedure calls get and try to insert the same Id on the table.
I have been able to solve this problem adding a tablock in the SELECT sentence:
DECLARE #Id INT = SELECT MAX(Id) + 1 FROM t WITH (TABLOCK)
The problem now is sucessive calls to the procedure must wait to the completion of the transaction currently beeing executed to start their work, allowing just one procedure to run simultaneosly.
Is there any advice or trick to get the lock just during the execution of the select and insert sentence?
Thanks.
TABLOCK is a terrible idea, since you're serialising all the calls (no concurrency).
Note that with an SP you will retain all the locks granted over the run until the SP completes.
So you want to minimise locks except for where you really need them.
Unless you have a special case, use an internally generated id:
CREATE TABLE t (Id INT IDENTITY PRIMARY KEY)
Improved performance, concurrency etc. since you are not dependent on external tables to manage the id.
If you have existing data you can (re)set the start value using DBCC
DBCC CHECKIDENT ('t', RESEED, 100)
If you need to inject rows with a value preassigned, use:
SET IDENTITY_INSERT t ON
(and off again afterwards, resetting the seed as required).
[Consider whether you want this value to be the primary key, or simply unique.
In many cases where you need to reference a tables PK as a FK then you'll want it as PK for simplicity of join, but having a business readable value (eg, Accounting Code or OrderNo+OrderLine is completely valid) : that's just modelling]

Creating a history table without using triggers

I have a TABLE A with 3000 records with 25 columns. I want to have a history table called Table A history holding all the changes updates and deletes for me to look up any day. I usually use cursors. Now thought using triggers which I was not asked to. Do you have any other suggestions? Many thanks!
If your using tsql /SQL server and you can't use triggers, which is the only sure way to get every change, maybe use a stored procedure that is scheduled in job to run every x amount of time, the stored procedure using a MERGE statement with the two tables to get new records or changes. I would not suggest this if you need every single change without question.
CREATE TABLE dbo.TableA (id INT, Column1 nvarchar(30))
CREATE TABLE dbo.TableA_History (id INT, Column1 nvarchar(30), TimeStamp DateTime)
(this code isn't production, just the general idea)
Put the following code inside a stored procedure and use a Sql Server Job with a schedule on it.
MERGE INTO dbo.TableA_History
USING dbo.TableA
ON TableA_History.id = TableA.id AND TableA_History.Column1 = TableA.Column1
WHEN NOT MATCHED BY TARGET THEN
INSERT (id,Column1,TimeStamp) VALUES (TableA.id,TableA.Column1,GETDATE())
So basically if the record either doesn't exist or doesn't match meaning a column changed, insert the record into the history table.
It is possible to create history without triggers in some case, even if you are not using SQL Server 2016 and system-versioned table are not available.
In some cases, when you can identify for sure which routines are modifying your table, you can create history using OUTPUT INTO clause.
For example,
INSERT INTO [dbo].[MainTable]
OUTPUT inserted.[]
,...
,'I'
,GETUTCDATE()
,#CurrentUserID
INTO [dbo].[HistoryTable]
SELECT *
FROM ... ;
In routines, when you are using MERGE I like that we can use $action:
Is available only for the MERGE statement. Specifies a column of type
nvarchar(10) in the OUTPUT clause in a MERGE statement that returns
one of three values for each row: 'INSERT', 'UPDATE', or 'DELETE',
according to the action that was performed on that row.
It's very handy that we can add the user which is modifying the table. Using triggers you need to use session context or session variable to pass the user. In versioning table you need to add additional column to the main table in order to log the user as it only logs the current table columns (at least for now).
So, basically it depends on your data and application. If you have many sources of CRUD over the table, the trigger is the most secure way. If your table is very big and heavily used, using MERGE is not good as it my cause blocking and harm performance.
In our databases we are using all of the methods depending on the situation:
triggers for legacy
system-versioning for new development
direct OUTPUT in the history, when sure that data is modified only by given set of routines

INSERT INTO vs SELECT INTO

What is the difference between using
SELECT ... INTO MyTable FROM...
and
INSERT INTO MyTable (...)
SELECT ... FROM ....
?
From BOL [ INSERT, SELECT...INTO ], I know that using SELECT...INTO will create the insertion table on the default file group if it doesn't already exist, and that the logging for this statement depends on the recovery model of the database.
Which statement is preferable?
Are there other performance implications?
What is a good use case for SELECT...INTO over INSERT INTO ...?
Edit: I already stated that I know that that SELECT INTO... creates a table where it doesn't exist. What I want to know is that SQL includes this statement for a reason, what is it? Is it doing something different behind the scenes for inserting rows, or is it just syntactic sugar on top of a CREATE TABLE and INSERT INTO.
They do different things. Use INSERT when the table exists. Use SELECT INTO when it does not.
Yes. INSERT with no table hints is normally logged. SELECT INTO is minimally logged assuming proper trace flags are set.
In my experience SELECT INTO is most commonly used with intermediate data sets, like #temp tables, or to copy out an entire table like for a backup. INSERT INTO is used when you insert into an existing table with a known structure.
EDIT
To address your edit, they do different things. If you are making a table and want to define the structure use CREATE TABLE and INSERT. Example of an issue that can be created: You have a small table with a varchar field. The largest string in your table now is 12 bytes. Your real data set will need up to 200 bytes. If you do SELECT INTO from your small table to make a new one, the later INSERT will fail with a truncation error because your fields are too small.
Which statement is preferable? Depends on what you are doing.
Are there other performance implications? If the table is a permanent table, you can create indexes at the time of table creation which has implications for performance both negatively and positiviely. Select into does not recreate indexes that exist on current tables and thus subsequent use of the table may be slower than it needs to be.
What is a good use case for SELECT...INTO over INSERT INTO ...? Select into is used if you may not know the table structure in advance. It is faster to write than create table and an insert statement, so it is used to speed up develoment at times. It is often faster to use when you are creating a quick temp table to test things or a backup table of a specific query (maybe records you are going to delete). It should be rare to see it used in production code that will run multiple times (except for temp tables) because it will fail if the table was already in existence.
It is sometimes used inappropriately by people who don't know what they are doing. And they can cause havoc in the db as a result. I strongly feel it is inappropriate to use SELECT INTO for anything other than a throwaway table (a temporary backup, a temp table that will go away at the end of the stored proc ,etc.). Permanent tables need real thought as to their design and SELECT INTO makes it easy to avoid thinking about anything even as basic as what columns and what datatypes.
In general, I prefer the use of the create table and insert statement - you have more controls and it is better for repeatable processes. Further, if the table is a permanent table, it should be created from a separate create table script (one that is in source control) as creating permanent objects should not, in general, in code are inserts/deletes/updates or selects from a table. Object changes should be handled separately from data changes because objects have implications beyond the needs of a specific insert/update/select/delete. You need to consider the best data types, think about FK constraints, PKs and other constraints, consider auditing requirements, think about indexing, etc.
Each statement has a distinct use case. They are not interchangeable.
SELECT...INTO MyTable... creates a new MyTable where one did not exist before.
INSERT INTO MyTable...SELECT... is used when MyTable already exists.
The primary difference is that SELECT INTO MyTable will create a new table called MyTable with the results, while INSERT INTO requires that MyTable already exists.
You would use SELECT INTO only in the case where the table didn't exist and you wanted to create it based on the results of your query. As such, these two statements really are not comparable. They do very different things.
In general, SELECT INTO is used more often for one off tasks, while INSERT INTO is used regularly to add rows to tables.
EDIT:
While you can use CREATE TABLE and INSERT INTO to accomplish what SELECT INTO does, with SELECT INTO you do not have to know the table definition beforehand. SELECT INTO is probably included in SQL because it makes tasks like ad hoc reporting or copying tables much easier.
Actually SELECT ... INTO not only creates the table but will fail if it already exists, so basically the only time you would use it is when the table you are inserting to does not exists.
In regards to your EDIT:
I personally mainly use SELECT ... INTO when I am creating a temp table. That to me is the main use. However I also use it when creating new tables with many columns with similar structures to other tables and then edit it in order to save time.
I only want to cover second point of the question that is related to performance, because no body else has covered this. Select Into is a lot more faster than insert into, when it comes to tables with large datasets. I prefer select into when I have to read a very large table. insert into for a table with 10 million rows may take hours while select into will do this in minutes, and as for as losing indexes on new table is concerned you can recreate the indexes by query and can still save a lot more time when compared to insert into.
SELECT INTO is typically used to generate temp tables or to copy another table (data and/or structure).
In day to day code you use INSERT because your tables should already exist to be read, UPDATEd, DELETEd, JOINed etc. Note: the INTO keyword is optional with INSERT
That is, applications won't normally create and drop tables as part of normal operations unless it is a temporary table for some scope limited and specific usage.
A table created by SELECT INTO will have no keys or indexes or constraints unlike a real, persisted, already existing table
The 2 aren't directly comparable because they have almost no overlap in usage
Select into creates new table for you at the time and then insert records in it from the source table. The newly created table has the same structure as of the source table.If you try to use select into for a existing table it will produce a error, because it will try to create new table with the same name.
Insert into requires the table to be exist in your database before you insert rows in it.
The simple difference between select Into and Insert Into is:
--> Select Into don't need existing table. If you want to copy table A data, you just type Select * INTO [tablename] from A. Here, tablename can be existing table or new table will be created which has same structure like table A.
--> Insert Into do need existing table.INSERT INTO [tablename] SELECT * FROM A;.
Here tablename is an existing table.
Select Into is usually more popular to copy data especially backup data.
You can use as per your requirement, it is totally developer choice which should be used in his scenario.
Performance wise Insert INTO is fast.
References :
https://www.w3schools.com/sql/sql_insert_into_select.asp
https://www.w3schools.com/sql/sql_select_into.asp
The other answers are all great/correct (the main difference is whether the DestTable exists already (INSERT), or doesn't exist yet (SELECT ... INTO))
You may prefer to use INSERT (instead of SELECT ... INTO), if you want to be able to COUNT(*) the rows that have been inserted so far.
Using SELECT COUNT(*) ... WITH NOLOCK is a simple/crude technique that may help you check the "progress" of the INSERT; helpful if it's a long-running insert, as seen in this answer).
[If you use...]
INSERT DestTable SELECT ... FROM SrcTable
...then your SELECT COUNT(*) from DestTable WITH (NOLOCK) query would work.
Select into for large datasets may be good only for a single user using one single connection to the database doing a bulk operation task. I do not recommend to use
SELECT * INTO table
as this creates one big transaction and creates schema lock to create the object, preventing other users to create object or access system objects until the SELECT INTO operation completes.
As proof of concept open 2 sessions, in first session try to use
select into temp table from a huge table
and in the second section try to
create a temp table
and check the locks, blocking and the duration of second session to create a temp table object. My recommendation it is always a good practice to create and Insert statement and if needed for minimal logging use trace flag 610.

Creating a SQL Server trigger to transition from a natural key to a surrogate key

Backstory
At work where we're planning on deprecating a Natural Key column in one of our primary tables. The project consists of 100+ applications that link to this table/column; 400+ stored procedures that reference this column directly; and a vast array of common tables between these applications that also reference this column.
The Big Bang and Start from Scratch methods are out of the picture. We're going to deprecate this column one application at a time, certify the changes, and move on to the next... and we've got a lengthy target goal to make this effort practical.
The problem I have is that a lot of these applications have shared stored procedures and tables. If I completely convert all of Application A's tables/stored procedures Application B and C will be broken until converted. These in turn may break applications D, E, F...Etc. I've already got a strategy implemented for Code classes and Stored Procedures, the part I'm stuck on is the transitioning state of the database.
Here's a basic example of what we have:
Users
---------------------------
Code varchar(32) natural key
Access
---------------------------
UserCode varchar(32) foreign key
AccessLevel int
And we're aiming now just for transitional state like this:
Users
---------------------------
Code varchar(32)
Id int surrogate key
Access
---------------------------
UserCode varchar(32)
UserID int foreign key
AccessLevel int
The idea being during the transitional phase un-migrated applications and stored procedures will still be able to access all the appropriate data and new ones can start pushing to the correct columns -- Once the migration is complete for all stored procedures and applications we can finally drop the extra columns.
I wanted to use SQL Server's triggers to automatically intercept any new Insert/Update's and do something like the following on each of the affected tables:
CREATE TRIGGER tr_Access_Sync
ON Access
INSTEAD OF INSERT(, UPDATE)
AS
BEGIN
DIM #code as Varchar(32)
DIM #id as int
SET #code = (SELECT inserted.code FROM inserted)
SET #id = (SELECT inserted.code FROM inserted)
-- This is a migrated application; find the appropriate legacy key
IF #code IS NULL AND #id IS NOT NULL
SELECT Code FROM Users WHERE Users.id = #id
-- This is a legacy application; find the appropriate surrogate key
IF #id IS NULL AND #code IS NOT NULL
SELECT Code FROM Users WHERE Users.id = #id
-- Impossible code:
UPDATE inserted SET inserted.code=#code, inserted.id=#id
END
Question
The 2 huge problems I'm having so far are:
I can't do an "AFTER INSERT" because NULL constraints will make the insert fail.
The "impossible code" I mentioned is how I'd like to cleanly proxy the original query; If the original query has x, y, z columns in it or just x, I ideally would like the same trigger to do these. And if I add/delete another column, I'd like the trigger to remain functional.
Anyone have a code example where this could be possible, or even an alternate solution for keeping these columns properly filled even when only one of values is passed to SQL?
Tricky business...
OK, first of all: this trigger will NOT work in many circumstances:
SET #code = (SELECT inserted.code FROM inserted)
SET #id = (SELECT inserted.code FROM inserted)
The trigger can be called with a set of rows in the Inserted pseudo-table - which one are you going to pick here?? You need to write your trigger in such a fashion that it will work even when you get 10 rows in the Inserted table. If a SQL statement inserts 10 rows, your trigger will not be fired ten times - one for each row - but only once for the whole batch - you need to take that into account!
Second point: I would try to make the ID's IDENTITY fields - then they'll always get a value - even for "legacy" apps. Those "old" apps should provide a legacy key instead - so you should be fine there. The only issue I see and don't know how you handle those are inserts from an already converted app - do they provide an "old-style" legacy key as well? If not - how quickly do you need to have such a key?
What I'm thinking about would be a "cleanup job" that would run over the table and get all the rows with a NULL legacy key and then provide some meaningful value for it. Make this a regular stored procedure and execute it every e.g. day, four hours, 30 minutes - whatever suits your needs. Then you don't have to deal with triggers and all the limitations they have.
Wouldn't it be possible to make the schema changes 'bigbang' but create views over the top of those tables that 'hide' the change?
I think you might find you are simply putting off the breakages to a later point in time: "We're going to deprecate this column one application at a time" - it might be my naivety but I can't see how that's ever going to work.
Surely, a worse mess can occur when different applications are doing things differently?
After sleeping on the problem, this seems to be the most generic/re-usable solution I could come up with within the SQL Syntax. It works fine even if both columns have a NOT NULL restraint, even if you don't reference the "other" column at all in your insert.
CREATE TRIGGER tr_Access_Sync
ON Access
INSTEAD OF INSERT
AS
BEGIN
/*-- Create a temporary table to modify because "inserted" is read-only */
/*-- "temp" is actually "#temp" but it throws off stackoverflow's syntax highlighting */
SELECT * INTO temp FROM inserted
/*-- If for whatever reason the secondary table has it's own identity column */
/*-- we need to get rid of it from our #temp table to do an Insert later with identities on */
ALTER TABLE temp DROP COLUMN oneToManyIdentity
UPDATE temp
SET
UserCode = ISNULL(UserCode, (SELECT UserCode FROM Users U WHERE U.UserID = temp.UserID)),
UserID = ISNULL(UserID, (SELECT UserID FROM Users U WHERE U.UserCode = temp.UserCode))
INSERT INTO Access SELECT * FROM temp
END

How do you add a NOT NULL Column to a large table in SQL Server?

To add a NOT NULL Column to a table with many records, a DEFAULT constraint needs to be applied. This constraint causes the entire ALTER TABLE command to take a long time to run if the table is very large. This is because:
Assumptions:
The DEFAULT constraint modifies existing records. This means that the db needs to increase the size of each record, which causes it to shift records on full data-pages to other data-pages and that takes time.
The DEFAULT update executes as an atomic transaction. This means that the transaction log will need to be grown so that a roll-back can be executed if necessary.
The transaction log keeps track of the entire record. Therefore, even though only a single field is modified, the space needed by the log will be based on the size of the entire record multiplied by the # of existing records. This means that adding a column to a table with small records will be faster than adding a column to a table with large records even if the total # of records are the same for both tables.
Possible solutions:
Suck it up and wait for the process to complete. Just make sure to set the timeout period to be very long. The problem with this is that it may take hours or days to do depending on the # of records.
Add the column but allow NULL. Afterward, run an UPDATE query to set the DEFAULT value for existing rows. Do not do UPDATE *. Update batches of records at a time or you'll end up with the same problem as solution #1. The problem with this approach is that you end up with a column that allows NULL when you know that this is an unnecessary option. I believe that there are some best practice documents out there that says that you should not have columns that allow NULL unless it's necessary.
Create a new table with the same schema. Add the column to that schema. Transfer the data over from the original table. Drop the original table and rename the new table. I'm not certain how this is any better than #1.
Questions:
Are my assumptions correct?
Are these my only solutions? If so, which one is the best? I f not, what else could I do?
I ran into this problem for my work also. And my solution is along #2.
Here are my steps (I am using SQL Server 2005):
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn varchar(40) DEFAULT('')
2) Add a NOT NULL constraint with the NOCHECK option. The NOCHECK does not enforce on existing values:
ALTER TABLE MyTable WITH NOCHECK
ADD CONSTRAINT MyColumn_NOTNULL CHECK (MyColumn IS NOT NULL)
3) Update the values incrementally in table:
GO
UPDATE TOP(3000) MyTable SET MyColumn = '' WHERE MyColumn IS NULL
GO 1000
The update statement will only update maximum 3000 records. This allow to save a chunk of data at the time. I have to use "MyColumn IS NULL" because my table does not have a sequence primary key.
GO 1000 will execute the previous statement 1000 times. This will update 3 million records, if you need more just increase this number. It will continue to execute until SQL Server returns 0 records for the UPDATE statement.
Here's what I would try:
Do a full backup of the database.
Add the new column, allowing nulls - don't set a default.
Set SIMPLE recovery, which truncates the tran log as soon as each batch is committed.
The SQL is: ALTER DATABASE XXX SET RECOVERY SIMPLE
Run the update in batches as you discussed above, committing after each one.
Reset the new column to no longer allow nulls.
Go back to the normal FULL recovery.
The SQL is: ALTER DATABASE XXX SET RECOVERY FULL
Backup the database again.
The use of the SIMPLE recovery model doesn't stop logging, but it significantly reduces its impact. This is because the server discards the recovery information after every commit.
You could:
Start a transaction.
Grab a write lock on your original table so no one writes to it.
Create a shadow table with the new schema.
Transfer all the data from the original table.
execute sp_rename to rename the old table out.
execute sp_rename to rename the new table in.
Finally, you commit the transaction.
The advantage of this approach is that your readers will be able to access the table during the long process and that you can perform any kind of schema change in the background.
Just to update this with the latest information.
In SQL Server 2012 this can now be carried out as an online operation in the following circumstances
Enterprise Edition only
The default must be a runtime constant
For the second requirement examples might be a literal constant or a function such as GETDATE() that evaluates to the same value for all rows. A default of NEWID() would not qualify and would still end up updating all rows there and then.
For defaults that qualify SQL Server evaluates them and stores the result as the default value in the column metadata so this is independent of the default constraint which is created (which can even be dropped if no longer required). This is viewable in sys.system_internals_partition_columns. The value doesn't get written out to the rows until next time they happen to get updated.
More details about this here: online non-null with values column add in sql server 2012
Admitted that this is an old question. My colleague recently told me that he was able to do it in one single alter table statement on a table with 13.6M rows. It finished within a second in SQL Server 2012. I was able to confirm the same on a table with 8M rows. Something changed in later version of SQL Server?
Alter table mytable add mycolumn char(1) not null default('N');
I think this depends on the SQL flavor you are using, but what if you took option 2, but at the very end alter table table to not null with the default value?
Would it be fast, since it sees all the values are not null?
If you want the column in the same table, you'll just have to do it. Now, option 3 is potentially the best for this because you can still have the database "live" while this operation is going on. If you use option 1, the table is locked while the operation happens and then you're really stuck.
If you don't really care if the column is in the table, then I suppose a segmented approach is the next best. Though, I really try to avoid that (to the point that I don't do it) because then like Charles Bretana says, you'll have to make sure and find all the places that update/insert that table and modify those. Ugh!
I had a similar problem, and went for your option #2.
It takes 20 minutes this way, as opposed to 32 hours the other way!!! Huge difference, thanks for the tip.
I wrote a full blog entry about it, but here's the important sql:
Alter table MyTable
Add MyNewColumn char(10) null default '?';
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 0 and 1000000
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 1000000 and 2000000
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 2000000 and 3000000
go
..etc..
Alter table MyTable
Alter column MyNewColumn char(10) not null;
And the blog entry if you're interested:
http://splinter.com.au/adding-a-column-to-a-massive-sql-server-table
I had a similar problem and I went with modified #3 approach. In my case the database was in SIMPLE recovery mode and the table to which column was supposed to be added was not referenced by any FK constraints.
Instead of creating a new table with the same schema and copying contents of original table, I used SELECT…INTO syntax.
According to Microsoft (http://technet.microsoft.com/en-us/library/ms188029(v=sql.105).aspx)
The amount of logging for SELECT...INTO depends on the recovery model
in effect for the database. Under the simple recovery model or
bulk-logged recovery model, bulk operations are minimally logged. With
minimal logging, using the SELECT… INTO statement can be more
efficient than creating a table and then populating the table with an
INSERT statement. For more information, see Operations That Can Be
Minimally Logged.
The sequence of steps :
1.Move data from old table to new while adding new column with default
SELECT table.*, cast (‘default’ as nvarchar(256)) new_column
INTO table_copy
FROM table
2.Drop old table
DROP TABLE table
3.Rename newly created table
EXEC sp_rename 'table_copy', ‘table’
4.Create necessary constraints and indexes on the new table
In my case the table had more than 100 million rows and this approach completed faster than approach #2 and log space growth was minimal.
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn int default 0
2) Update the values incrementally in the table (same effect as accepted answer). Adjust the number of records being updated to your environment, to avoid blocking other users/processes.
declare #rowcount int = 1
while (#rowcount > 0)
begin
UPDATE TOP(10000) MyTable SET MyColumn = 0 WHERE MyColumn IS NULL
set #rowcount = ##ROWCOUNT
end
3) Alter the column definition to require not null. Run the following at a moment when the table is not in use (or schedule a few minutes of downtime). I have successfully used this for tables with millions of records.
ALTER TABLE MyTable ALTER COLUMN MyColumn int NOT NULL
I would use CURSOR instead of UPDATE. Cursor will update all matching records in batch, record by record -- it takes time but not locks table.
If you want to avoid locks use WAIT.
Also I am not sure, that DEFAULT constrain changes existing rows.
Probably NOT NULL constrain use together with DEFAULT causes case described by author.
If it changes add it in the end
So pseudocode will look like:
-- without NOT NULL constrain -- we will add it in the end
ALTER TABLE table ADD new_column INT DEFAULT 0
DECLARE fillNullColumn CURSOR LOCAL FAST_FORWARD
SELECT
key
FROM
table WITH (NOLOCK)
WHERE
new_column IS NULL
OPEN fillNullColumn
DECLARE
#key INT
FETCH NEXT FROM fillNullColumn INTO #key
WHILE ##FETCH_STATUS = 0 BEGIN
UPDATE
table WITH (ROWLOCK)
SET
new_column = 0 -- default value
WHERE
key = #key
WAIT 00:00:05 --wait 5 seconds, keep in mind it causes updating only 12 rows per minute
FETCH NEXT FROM fillNullColumn INTO #key
END
CLOSE fillNullColumn
DEALLOCATE fillNullColumn
ALTER TABLE table ALTER COLUMN new_column ADD CONSTRAIN xxx
I am sure that there are some syntax errors, but I hope that this
help to solve your problem.
Good luck!
Vertically segment the table. This means you will have two tables, with the same primary key, and exactly the same number of records... One will be the one you already have, the other will have just the key, and the new Non-Null column (with default value) .
Modify all Insert, Update, and delete code so they keep the two tables in synch... If you want you can create a view that "joins" the two tables together to create a single logical combination of the two that appears like a single table for client Select statements...

Resources