I am using SQL Azure. I have a deployment DB and a test DB
I would like to add some new lookup records to the Test DB, to test new code.
Initially My Deployment DB's Identity was set to 200000+ for the PKs, and my Test DB to 100000+, to prevent PK collisions when syncing using such tools as Redgates' SQL data compare.
Unfortunately I made a mistake, and copied the Deployment DB as the new Test DB, since we required a more up to date dataset. As a result my Test DB now starts at 200000+. So I now have the risk of conflicts when syncing with the Deployment DB.
I would normallly just use:
DBCC CHECKIDENT('TableName', RESEED, 105000)
However SQL Azure does not support this.
I have come across a workaround:
set identity_insert TableName on -- this basically turns off IDENTITY
INSERT INTO TableName(id, name) VALUES (104999,'Test Reset Identity Start 104999') -- so we can jam any value for column ID
set identity_insert TableName off -- then turn it back on
INSERT INTO TableName(name) VALUES ('Test Reset Identity End') -- ID starts at 105000, in theory, from this point
SELECT * FROM TableName
However the new Identify value always seems to take the highest PK in the table as the last seed, rather than the PK value from the last inserted record.
I would rather not rebuild the table.
Is there another approach to resetting the identity value for my situation, ie to a lesser number ie 104999 rather the current 204999?
Thanks in advance.
EDIT 1
It may be I can only check the identity value to a value greater than the current value ie to say 206000 ?
EDIT 2
Perhaps there is an argument that Identity value should never be reseed to less than the max PK value, even if there is 100,000 spare numbers, as one day you will get a collision.
Related
Is it possible to only alter a table to make an existing column a serial auto generated key, without adding a new column? Sorry if this question is a bit newbie-ish for PostgreSQL, I'm more a SQL Server person but moving to PostgreSQL..
In a nut shell the program will copying an existing SQL Server database into PostgreSQL. With the desire to have a mirrored DB in PostgreSQL as the source from SQL Server with the only caveat one may selectively include/exclude any table or column as desired, or do everything...
Given the process copies all values, thought one should be able create the keys after the copy has finished just as one may do in SQL Server. Thought PostgreSQL would have a comparable methods as SQL Server's SET INSERT_IDENTITY [ON|OFF] so one may override the auto generated key with a desired value. Not seeing an equivalent in PostgreSQL. So my fallback is to create the mirrored records in Postgres without keys any keys and then alter the tables. But it seems to fix up the table as desired one has create a new column, but doing this break or cause a headache fixing up the RI for PK/FK relationships.
Any suggestions? Thanks in advance.
In PostgreSQL, the auto-generated key is always overridden if you insert an explicit value for it. If you don't specify a value (omit the column), or specify the keyword DEFAULT, a generated key is used.
Given table
CREATE TABLE t1 (id serial primary key, dat text);
then both these will get a generated key from sequence t1_id_seq:
INSERT INTO t1 (dat) VALUES ('fred');
INSERT INTO t1 (id, dat) VALUES (DEFAULT, 'bob');
This will instead provide its own value:
INSERT INTO t1 (id, dat) VALUES (42, 'joe');
You are responsible for ensuring that the provided value doesn't conflict with existing data, or with future values the identity sequence will generate. PostgreSQL will not notice that you manually inserted a row with id 42 and skip when its own sequence counter gets to that point.
Usually what you do is load with provided values, then reset the sequence to the max of all keys already in the table, so it keeps counting from there for new local inserts.
I am troubleshooting a db (not my forte) that has some tables that are auto incrementing on the id column and some tables are not?
Now all the tables are set as identity and marked to disallow null. I am using SSMS what else can I check or do to get these tables back to auto incrementing?
TIA
Interestingly to me...probably old news to you guys. The issue had to do with existing data. So for example a table that had 100 rows in did NOT have the identity column setup. So I would go in and make it an identity with a seed of 1 incrementing 1. Well the seed was somehow having trouble because there was already 100 rows in there. So for all my tables I had to do a row count and seed the identity from the next row. Now it is working as expected.
Having an IDENTITY column is pretty straightforward for a table, and if you are not seeing the auto incrementing behavior of a column on inserts, then I'd first verify that your column is indeed an IDENTITY column:
use <Your Database Name>;
go
select
name,
is_identity
from sys.columns
where object_id = object_id('<Your Table Name>')
order by column_id;
Substitute <Your Database Nam> and <Your Table Name> for their appropriate values.
The other possibility is that data that "appears" to be non-incrementing could have been pushed out to that with a session that set identity insert and pushed out explicit values.
ALTER TABLE YourTable MODIFY COLUMN YourTable_id int(4) auto_increment
I have a suite of database unit tests that were created. In order to replicate all of the tests on any machine I deploy out to there are scripts to drop the database and recreate it. The test will work for all of the unit tests except for the first test.
The reason the first test fails is that I am executing the "dbcc checkident" command before each test and resetting all of the identities in order to ensure that all of the identities are the same. On a new table that has never been inserted into and then has "dbcc checkident" ran against it the identity starts at 0 when inserted into instead of 1.
If I use some of the other built in commands to check the identity in a fresh table, they return 0 as the identity. The benefit for checking through "dbcc checkident" is that the identity comes back as 'NULL' if a row has never been inserted into it. Only "dbcc checkident" tells us through a print message and cannot easily be tested.
How can I verify that that I need to reset the identity or not through database commands without inserting a row, deleting it, then resetting just to avoid the identity from getting off on the first record?
Example of inserting into a table
Identity after inserting a row on a fresh table without running "dbcc checkident" and setting identity to 0 = 1
Identity after inserting a row on a fresh table and running "dbcc checkident" and setting identity to 0 = 0
Identity after inserting a row on an existing table thats been inserted into and running "dbcc checkident" and setting identity to 0 = 1
To get around the identity insert problem with "dbcc checkident" on a new table, here's what I did.
Using the sys tables I was able to manually check the last identity value through sql. If the last identity was reset, it will change the last value to 0. On a new table that has never been inserted into the last identity will be null.
Here is the code snippet we used. This will allow you to check the last identity without doing an insert, delete, reset identity.
-- {0} is your table name
-- {1} is your identity value of the reset
IF EXISTS (SELECT null FROM sys.identity_columns WHERE OBJECT_NAME(OBJECT_ID) = '{0}' AND last_value IS NOT NULL)
DBCC CHECKIDENT ({0}, RESEED, {1});
I'm trying to insert a few thousand rows into a table in a database that is replicated across two servers. From either the publisher or the subscriber, I get the same error:
Msg 548, Level 16, State 2, Line 1
The insert failed. It conflicted with an identity range check constraint in database 'XXX', replicated table 'dbo.NODE_ATTRIB_RSLT', column 'ID'. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
The statement has been terminated.
Checking the constraint on the table, it seems to me like I should be able to insert at least 1000 rows at a time before running into issues. However, I get the same error when trying to insert just a few tens of rows!
Here's how I'm trying to insert data:
insert into NODE_ATTRIB_RSLT
([NODE_ID]
,[ATTRIB_ID]
,[STATE_ID]
,[PLAN_REVISION_ID]
,[TIMESTAMP]
,[VALUE]
,[VALUE_TEXT]
,[LAST_MODIFIED])
SELECT [NODE_ID]
,[ATTRIB_ID]
,[STATE_ID]
,[PLAN_REVISION_ID]
,[TIMESTAMP]
,[VALUE]
,[VALUE_TEXT]
,[LAST_MODIFIED] FROM [NODE_ATTRIB_RSLT_TEMP]
The PK column is an autogenerated identity called ID. To try to insert fewer rows at a time I've added a WHERE clause at the end of the select like so:
WHERE ID >= 1000 and ID <1100
but to no avail.
Running sp_adjustpublisheridentityrange on the Publisher executes successfully but has no effect.
How can I fix this problem with inserts?
How can I modify the ranges of the indentity range contraints to a more reasonable level while leaving the replication running?
I think I worked out what the problem was.
Looking at the properties for the replicated table, it had the standard default identity range of 10000 for the Publisher and 1000 for the Subcriber.
However, checking the identity constraint on the actual table (using SP_HELPCONSTRAINT 'node_attrib_rslt') revealed that there was only a pool of 1000 IDs on both servers. This made the bulk insert fail even when I restricted the number of rows to insert - I'm guessing SQL Server doesn't even get that far when it checks the constraint when running an INSERT INTO.
To fix it I had to do several things:
Change the identity range of the table. I set it up to 20K for both Publisher and Subcriber.
On the Publisher, expand Replication --> Local Publications
Right-click the particular subscription and choose Properties.
Select the Articles page.
Highlight the appropriate Table.
Click on the Article Properties 'button', and choose 'Set Properties of Highlighted Table Article'.
In the Article Properties window, look for Identity Range Management options.
Change the appropriate values.
Press OK and OK on the dialog windows.
Run the sp_adjustpublisheridentityrange stored proc on the Publisher.
New query window on the server
Choose the correct database
Execute sp_adjustpublisheridentityrange #table_name = 'node_attrib_rslt'
From the subcriber, force-synchronise the servers.
On the Subscriber, expand Replication --> Local Subcriptions
Right-click the particular subscription and choose View Subscription Status.
In the dialog that appears, press the monitor button.
In the Replication Monitor window that appears, expand the particular Publisher in the left hand pane.
Click on the Subcription to edit.
In the right hand pane, right-click on the subcription status and choose Start Synchronising.
The status should update to 'Synchronising' while it does its thing.
After it's finished, click on the 'Warnings and Agents' tab. I had a 'Snapshot Agent' listed in the lower pane. Right click on that Agent and start it. After it had been running for a while, the change of properties on the server should have migrated to the client.
Maybe: insert some test rows into the table.
Edit: I've had to do this task again recently, and the constraint on the table would not update until I inserted a bunch of dummy data into the table so as to exhaust the default constraint. Then I resyncronised the servers, and the constraint was updated to the new value.
After that, checking the identity constraint revealed that I finally had a 20K ID range to insert with on both the Publisher and the Subcriber.
I had this exact same issue, and the above solution didn't do anything for me.
Instead what ended up solving the issue was by setting the new, larger identity ranges on the tables in the publication, and then dropping the identity constraints on the table
And then finally running this command that sets the current identity.
DBCC CHECKIDENT ('TableName', RESEED, 1000000000);
Instead of setting the value to 1000000000, the constraint is created again, and set to the correct identity range value currently specified on the table in the publication.
It looks like the CHECKIDENT command forces the constraint to be updated somehow.
The above solution worked for me, but was actually my attempt at just dropping the constraints and setting the publisher and subscriber to use different identity ranges so they would be able to insert rows in the tables. Fortunately the CHECKIDENT seemed to refresh the constraint, something i originally expected the sp_adjustpublisheridentityrange stored procedure to do - except it did nothing.
I ran the above command on both the publisher and the subscriber.
To add a NOT NULL Column to a table with many records, a DEFAULT constraint needs to be applied. This constraint causes the entire ALTER TABLE command to take a long time to run if the table is very large. This is because:
Assumptions:
The DEFAULT constraint modifies existing records. This means that the db needs to increase the size of each record, which causes it to shift records on full data-pages to other data-pages and that takes time.
The DEFAULT update executes as an atomic transaction. This means that the transaction log will need to be grown so that a roll-back can be executed if necessary.
The transaction log keeps track of the entire record. Therefore, even though only a single field is modified, the space needed by the log will be based on the size of the entire record multiplied by the # of existing records. This means that adding a column to a table with small records will be faster than adding a column to a table with large records even if the total # of records are the same for both tables.
Possible solutions:
Suck it up and wait for the process to complete. Just make sure to set the timeout period to be very long. The problem with this is that it may take hours or days to do depending on the # of records.
Add the column but allow NULL. Afterward, run an UPDATE query to set the DEFAULT value for existing rows. Do not do UPDATE *. Update batches of records at a time or you'll end up with the same problem as solution #1. The problem with this approach is that you end up with a column that allows NULL when you know that this is an unnecessary option. I believe that there are some best practice documents out there that says that you should not have columns that allow NULL unless it's necessary.
Create a new table with the same schema. Add the column to that schema. Transfer the data over from the original table. Drop the original table and rename the new table. I'm not certain how this is any better than #1.
Questions:
Are my assumptions correct?
Are these my only solutions? If so, which one is the best? I f not, what else could I do?
I ran into this problem for my work also. And my solution is along #2.
Here are my steps (I am using SQL Server 2005):
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn varchar(40) DEFAULT('')
2) Add a NOT NULL constraint with the NOCHECK option. The NOCHECK does not enforce on existing values:
ALTER TABLE MyTable WITH NOCHECK
ADD CONSTRAINT MyColumn_NOTNULL CHECK (MyColumn IS NOT NULL)
3) Update the values incrementally in table:
GO
UPDATE TOP(3000) MyTable SET MyColumn = '' WHERE MyColumn IS NULL
GO 1000
The update statement will only update maximum 3000 records. This allow to save a chunk of data at the time. I have to use "MyColumn IS NULL" because my table does not have a sequence primary key.
GO 1000 will execute the previous statement 1000 times. This will update 3 million records, if you need more just increase this number. It will continue to execute until SQL Server returns 0 records for the UPDATE statement.
Here's what I would try:
Do a full backup of the database.
Add the new column, allowing nulls - don't set a default.
Set SIMPLE recovery, which truncates the tran log as soon as each batch is committed.
The SQL is: ALTER DATABASE XXX SET RECOVERY SIMPLE
Run the update in batches as you discussed above, committing after each one.
Reset the new column to no longer allow nulls.
Go back to the normal FULL recovery.
The SQL is: ALTER DATABASE XXX SET RECOVERY FULL
Backup the database again.
The use of the SIMPLE recovery model doesn't stop logging, but it significantly reduces its impact. This is because the server discards the recovery information after every commit.
You could:
Start a transaction.
Grab a write lock on your original table so no one writes to it.
Create a shadow table with the new schema.
Transfer all the data from the original table.
execute sp_rename to rename the old table out.
execute sp_rename to rename the new table in.
Finally, you commit the transaction.
The advantage of this approach is that your readers will be able to access the table during the long process and that you can perform any kind of schema change in the background.
Just to update this with the latest information.
In SQL Server 2012 this can now be carried out as an online operation in the following circumstances
Enterprise Edition only
The default must be a runtime constant
For the second requirement examples might be a literal constant or a function such as GETDATE() that evaluates to the same value for all rows. A default of NEWID() would not qualify and would still end up updating all rows there and then.
For defaults that qualify SQL Server evaluates them and stores the result as the default value in the column metadata so this is independent of the default constraint which is created (which can even be dropped if no longer required). This is viewable in sys.system_internals_partition_columns. The value doesn't get written out to the rows until next time they happen to get updated.
More details about this here: online non-null with values column add in sql server 2012
Admitted that this is an old question. My colleague recently told me that he was able to do it in one single alter table statement on a table with 13.6M rows. It finished within a second in SQL Server 2012. I was able to confirm the same on a table with 8M rows. Something changed in later version of SQL Server?
Alter table mytable add mycolumn char(1) not null default('N');
I think this depends on the SQL flavor you are using, but what if you took option 2, but at the very end alter table table to not null with the default value?
Would it be fast, since it sees all the values are not null?
If you want the column in the same table, you'll just have to do it. Now, option 3 is potentially the best for this because you can still have the database "live" while this operation is going on. If you use option 1, the table is locked while the operation happens and then you're really stuck.
If you don't really care if the column is in the table, then I suppose a segmented approach is the next best. Though, I really try to avoid that (to the point that I don't do it) because then like Charles Bretana says, you'll have to make sure and find all the places that update/insert that table and modify those. Ugh!
I had a similar problem, and went for your option #2.
It takes 20 minutes this way, as opposed to 32 hours the other way!!! Huge difference, thanks for the tip.
I wrote a full blog entry about it, but here's the important sql:
Alter table MyTable
Add MyNewColumn char(10) null default '?';
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 0 and 1000000
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 1000000 and 2000000
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 2000000 and 3000000
go
..etc..
Alter table MyTable
Alter column MyNewColumn char(10) not null;
And the blog entry if you're interested:
http://splinter.com.au/adding-a-column-to-a-massive-sql-server-table
I had a similar problem and I went with modified #3 approach. In my case the database was in SIMPLE recovery mode and the table to which column was supposed to be added was not referenced by any FK constraints.
Instead of creating a new table with the same schema and copying contents of original table, I used SELECT…INTO syntax.
According to Microsoft (http://technet.microsoft.com/en-us/library/ms188029(v=sql.105).aspx)
The amount of logging for SELECT...INTO depends on the recovery model
in effect for the database. Under the simple recovery model or
bulk-logged recovery model, bulk operations are minimally logged. With
minimal logging, using the SELECT… INTO statement can be more
efficient than creating a table and then populating the table with an
INSERT statement. For more information, see Operations That Can Be
Minimally Logged.
The sequence of steps :
1.Move data from old table to new while adding new column with default
SELECT table.*, cast (‘default’ as nvarchar(256)) new_column
INTO table_copy
FROM table
2.Drop old table
DROP TABLE table
3.Rename newly created table
EXEC sp_rename 'table_copy', ‘table’
4.Create necessary constraints and indexes on the new table
In my case the table had more than 100 million rows and this approach completed faster than approach #2 and log space growth was minimal.
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn int default 0
2) Update the values incrementally in the table (same effect as accepted answer). Adjust the number of records being updated to your environment, to avoid blocking other users/processes.
declare #rowcount int = 1
while (#rowcount > 0)
begin
UPDATE TOP(10000) MyTable SET MyColumn = 0 WHERE MyColumn IS NULL
set #rowcount = ##ROWCOUNT
end
3) Alter the column definition to require not null. Run the following at a moment when the table is not in use (or schedule a few minutes of downtime). I have successfully used this for tables with millions of records.
ALTER TABLE MyTable ALTER COLUMN MyColumn int NOT NULL
I would use CURSOR instead of UPDATE. Cursor will update all matching records in batch, record by record -- it takes time but not locks table.
If you want to avoid locks use WAIT.
Also I am not sure, that DEFAULT constrain changes existing rows.
Probably NOT NULL constrain use together with DEFAULT causes case described by author.
If it changes add it in the end
So pseudocode will look like:
-- without NOT NULL constrain -- we will add it in the end
ALTER TABLE table ADD new_column INT DEFAULT 0
DECLARE fillNullColumn CURSOR LOCAL FAST_FORWARD
SELECT
key
FROM
table WITH (NOLOCK)
WHERE
new_column IS NULL
OPEN fillNullColumn
DECLARE
#key INT
FETCH NEXT FROM fillNullColumn INTO #key
WHILE ##FETCH_STATUS = 0 BEGIN
UPDATE
table WITH (ROWLOCK)
SET
new_column = 0 -- default value
WHERE
key = #key
WAIT 00:00:05 --wait 5 seconds, keep in mind it causes updating only 12 rows per minute
FETCH NEXT FROM fillNullColumn INTO #key
END
CLOSE fillNullColumn
DEALLOCATE fillNullColumn
ALTER TABLE table ALTER COLUMN new_column ADD CONSTRAIN xxx
I am sure that there are some syntax errors, but I hope that this
help to solve your problem.
Good luck!
Vertically segment the table. This means you will have two tables, with the same primary key, and exactly the same number of records... One will be the one you already have, the other will have just the key, and the new Non-Null column (with default value) .
Modify all Insert, Update, and delete code so they keep the two tables in synch... If you want you can create a view that "joins" the two tables together to create a single logical combination of the two that appears like a single table for client Select statements...