Making primary key and identity column after data has been loaded

Making primary key and identity column after data has been loaded - sql-server

I have quick question for you SQL gurus. I have existing tables without primary key column and Identity is not set. Now I am trying to modify those tables by making existing integer column as primary key and adding identity values for that column. My question is should I first copy all the records from the table to a temp table before making those changes . Do I loose all the previous records if I ran the T-SQL commnad to make primary key and add identity column on those tables. What are the approaches should I take such as
1) Create temp table to copy all the records from the table to be modified
2) Load all the records to the temptable
3) Make changes on the table schema
4) Finally load the records from the temp table to the original table.
Or
there are better ways that this? I really appreciate your help
Thanks

Tools>Options>Designers>Table and Database Designers
Uncheck "Prevent saving changes that require table re-creation"
[Edit] I've tried this with populated tables and I didn't lose data, but I don't really know much about this.

Hopefully you don't have too many records in the table. What happens if you use Management studio to change an existing field to identity is that it creates another table with the identity field set. it turns identity insert on and inserets the records from the original table, then turns identity insert off. Then it drops the old table and renames the table it just created. This can be quite a lengthy process if you have many records. If so I would script this out and then do it in a job that runs during the off hours because the table will be completely locked while you do this.

just do all of your changes in management studio, copy/paste the generated script into a file. DON'T SAVE CHANGES at this point. Look over and edit that script as necessary, it will probably do almost exactly what you are thinking (it will drop the original table and rename the temp one to the original's name), but handle all constraints and FKs as well.

If your existing integer column is unique and suitable, there should be no problem converting it to a PK.
Another alternative, if you don't want to use the existing column, you can add a new PK columns to the main table, populate it and seed it, then run update statements to update all other tables with new PK.
Whatever way you do it, make sure you do a back-up first!!

You can always add the IDENTITY column after you have finished copying your data around. You can also then reset the IDENTITY seed to the max integer + 1. That should solve your problems.
DBCC CHECKIDENT ('MyTable', RESEED, n)
Where n is the number you want the identity to start at.

Related

SSIS only extract Delta changes

After some advice. I'm using SSIS\SQL Server 2014. I have a nightly SSIS package that pulls in data from non-SQL Server db's into a single table (the SQL table is truncated beforehand each time) and I then extract from this table to create a daily csv file.
Going forward, I only want to extract to csv on a daily basis the records that have changed i.e. the Deltas.
What is the best approach? I was thinking of using CDC in SSIS, but as I'm truncating the SQL table before the initial load each time, will this be best method? Or will I need to have a master table in SQL with an initial load, then import into another table and just extract where there are different? For info, the table in SQL contains a Primary Key.
I just want to double check as CDC assumes the tables are all in SQL Server, whereas my data is coming from outside SQL Server first.
Thanks for any help.

The primary key on that table is your saving grace here. Obviously enough, the SQL Server database that you're pulling the disparate data into won't know from one table flush to the next which records have changed, but if you add two additional tables, and modify the existing table with an additional column, it should be able to figure it out by leveraging HASHBYTES.
For this example, I'll call the new table SentRows, but you can use a more meaningful name in practice. We'll call the new column in the old table HashValue.
Add the column HashValue to your table as a varbinary data type. NOT NULL as well.
Create your SentRows table with columns for all the columns in the main table's primary key, plus the HashValue column.
Create a RowsToSend table that's structurally identical to your main table, including the HashValue.
Modify your queries to create the HashValue by applying HASHBYTES to all of the non-key columns in the table. (This will be horribly tedious. Sorry about that.)
Send out your full data set.
Now move all of the key values and HashValues to the SentRows table. Truncate your main table.
On the next pull, compare the key values and HashValues from SentRows to the new data in the main table.
Primary key match + hash match = Unchanged row
Primary key match + hash mismatch = Updated row
Primary key in incoming data but missing from existing data set = New row
Primary key not in incoming data but in existing data set = Deleted row
Pull out any changes you need to send to the RowsToSend table.
Send the changes from RowsToSend.
Move the key values and HashValues to your SentRows table. Update hashes for changed key values, insert new rows, and decide how you're going to handle deletes, if you have to deal with deletes.
Truncate the SentRows table to get ready for tomorrow.
If you'd like (and you'll thank yourself later if you do) add a computed column to the SentRows table with default of GETDATE(), which will tell you when the row was added.
And away you go. Nothing but deltas from now on.
Edit 2019-10-31:
Step by step (or TL;DR):
1) Flush and Fill MainTable.
2) Compare keys and hashes on MainTable to keys and hashes on SentRows to identify new/changed rows.
3) Move new/changed rows to RowsToSend.
4) Send the rows that are in RowsToSend.
5) Move all the rows from RowsToSend to SentRows.
6) Truncate RowsToSend.

Behind the scene operations for ALTER COLUMN statement in SQL Server

I am altering the column datatype for a table with around 100 Million records using the below query:
ALTER TABLE dbo.TARGETTABLE
ALTER COLUMN XXX_DATE DATE
The column values are in the right date format as I inserted original date from a valid data source.
However, the query have been running for a long time and even when I attempt to cancel the query it seems to take forever.
Can anyone explain what is happening behind the scene in SQL Server when an ALTER TABLE STATEMENT is executed and why requires such resources?

There are a lot of variables that will make these Alter statements
make multiple passes through your table and make heavy use of TempDB
and depending on efficiency of TempDB it could be very slow.
Examples include whether or not the column you are changing is in the
index (especally clustered index since non-clustering key carries the
clustering index).
Instead of altering table...i will give you one simple exmaple...so you can try this....
Suppose your table name is tblTarget1
Create the another table (tblTarget2) with same structure...
Change the dataType of tblTarget2.....
Copy the data from tblTarget1 To tblTarget2 using Insert into query....
Drop the original table(tblTarget1)
Rename the tblTarget2 as tblTarget1
The main Reaseon is that....changing the data type will take a lot of data transfer and data page alignment....
For more Information you can follow this Link

Another approach to do this is the following:
Add new column to the table - [_date] date
Using batch update you can change transfer the values from the old to the new column without blocking the table for the other users.
Then in one transaction do the following:
update all of the new values inserted after the update is done
drop the old column
rename the new column
Note, if you have an index on this field you need to drop it before deleting the old column and create if after renaming the new one.

Add column VS Recreating table

I was going through some SQL Scripts a coworker had wrote to upgrade a table to a newer version. It was simply adding a new column to the table. However, instead of a
ALTER TABLE [Table] ADD [Column] [DataType]
statement, he instead made a copy of the table with the new column, repopulated it with the existing data, deleted the old table, renamed the new one to the old table, and then re-added all the indexes and relationships.
My question is, is there any benefit to doing this rather than the simple Alter statement or is there any difference?
I can't imagine all that work for practically no difference other than the column's ordinal position being in the desired place.

When you use the SSMS GUI, it will sometimes take this approach. One possible reasons for doing it this way is "Inserting" a column rather than "appending" a column. (if you want the column to appear before some already existing columns.) In that case, adding a column won't work.
Essentially, any time a simple addition of a new column to the end of the table isn't what you're looking for. But my guess is that he used the GUI to add the column and chose the "generate SQL script" option.

Just a couple of differences on how that would be different. Hope this helps!
Adding new column to existing table:
Pros:
No need to recreate indices and constraints on the existing table.
Data on existing column remains intact.
Cons:
Huge table that has millions of records will need to be updated for
the new column.
Recreating a New Table:
Pros:
No need to worry for any kind of limits (number of columns, total
size of table) in the RDBMS
Cons:
Recreate indices and constraints
Reload data for all columns

Changing columns to identity (SQL Server)

My company has an application with a bunch of database tables that used to use a sequence table to determine the next value to use. Recently, we switched this to using an identity property. The problem is that in order to upgrade a client to the latest version of the software, we have to change about 150 tables to identity. To do this manually, you can right click on a table, choose design, change (Is Identity) to "Yes" and then save the table. From what I understand, in the background, SQL Server exports this to a temporary table, drops the table and then copies everything back into the new table. Clients may have their own unique indexes and possibly other things specific to the client, so making a generic script isn't really an option.
It would be really awesome if there was a stored procedure for scripting this task rather than doing it in the GUI (which takes FOREVER). We made a macro that can go through and do this, but even then, it takes a long time to run and is error prone. Something like: exec sp_change_to_identity 'table_name', 'column name'
Does something like this exist? If not, how would you handle this situation?
Update: This is SQL Server 2008 R2.

This is what SSMS seems to do:
Obtain and Drop all the foreign keys pointing to the original table.
Obtain the Indexes, Triggers, Foreign Keys and Statistics of the original table.
Create a temp_table with the same schema as the original table, with the Identity field.
Insert into temp_table all the rows from the original table (Identity_Insert On).
Drop the original table (this will drop its indexes, triggers, foreign keys and statistics)
Rename temp_table to the original table name
Recreate the foreign keys obtained in (1)
Recreate the objects obtained in (2)

Modify autoincrement column to global autoincrement column

Our company uses Sybase and we are planning on setting up a Mobilink system (data replication system). We therefore need to change from using autoincrement columns to global autoincrememnt columns.
My question is what steps do I need to take to get this working properly. There is already thousands of rows of data that used the regular autoincrement default.
I'm thinking I need to create a new column with a default of global autoincrement, fill it with data (number(*)), switch the PK to it, drop the old FK's, drop the old column, rename the new column to the old one, then re-apply the FK's.
Is there an easier way to accomplish what I need here?
thanks!

That's generally the way to go about it. But there are some specific statements you make that cause me concern. Also the sequence. I am not sure of your experience level, the terms you use may or may not be accurate.
For each table ...
... switch the PK to it
What about the FK values in the child tables ? Or do you mean you will change them as well ?
... drop the old FK's
Ok, that's the constraint.
... drop the old column, rename the new column to the old one, then re-apply the FK's.
What exactly do you mean by that ? Add the FK constraint back in ? That won't change the existing data, it will apply to any new rows added.
Hope you see what I mean by the sequence of your tasks is suspect. Before you drop the old_PK_column in the parent, you need to:
Add the dropped FK constraints in each child table.
For each child table: UPDATE all the FK values to the new_PK_column.
Then drop the old_PK_column.

you're just changing the way PK values are generated, so it's enough to:
ALTER TABLE <table>
modify <column> default global autoincrement (1000000);
to use a partition size of 1,000,0000
Also make sure you set the global database identifier in each db, for example:
SET OPTION PUBLIC.global_database_id = 10;
So the next PK that will be generated is 10,000,001

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight