Re-insert records with IDENTITY Insert in SQL Server - sql-server

Some records in a table (with no primary key) were deleted. I need to re-insert these records. One of my column is an IDENTITY column,
What should be my approach on a live database? Should I do IDENTITY_INSERT ON/OFF or is there a better way?

If you have to insert records whose identity column has to have a specific value (to restore eventual past references), then i would say yes, even if this is risky due to possible duplicates.
Otherwise you insert them again with the new IDENTITY values.

Related

SSIS only extract Delta changes

After some advice. I'm using SSIS\SQL Server 2014. I have a nightly SSIS package that pulls in data from non-SQL Server db's into a single table (the SQL table is truncated beforehand each time) and I then extract from this table to create a daily csv file.
Going forward, I only want to extract to csv on a daily basis the records that have changed i.e. the Deltas.
What is the best approach? I was thinking of using CDC in SSIS, but as I'm truncating the SQL table before the initial load each time, will this be best method? Or will I need to have a master table in SQL with an initial load, then import into another table and just extract where there are different? For info, the table in SQL contains a Primary Key.
I just want to double check as CDC assumes the tables are all in SQL Server, whereas my data is coming from outside SQL Server first.
Thanks for any help.
The primary key on that table is your saving grace here. Obviously enough, the SQL Server database that you're pulling the disparate data into won't know from one table flush to the next which records have changed, but if you add two additional tables, and modify the existing table with an additional column, it should be able to figure it out by leveraging HASHBYTES.
For this example, I'll call the new table SentRows, but you can use a more meaningful name in practice. We'll call the new column in the old table HashValue.
Add the column HashValue to your table as a varbinary data type. NOT NULL as well.
Create your SentRows table with columns for all the columns in the main table's primary key, plus the HashValue column.
Create a RowsToSend table that's structurally identical to your main table, including the HashValue.
Modify your queries to create the HashValue by applying HASHBYTES to all of the non-key columns in the table. (This will be horribly tedious. Sorry about that.)
Send out your full data set.
Now move all of the key values and HashValues to the SentRows table. Truncate your main table.
On the next pull, compare the key values and HashValues from SentRows to the new data in the main table.
Primary key match + hash match = Unchanged row
Primary key match + hash mismatch = Updated row
Primary key in incoming data but missing from existing data set = New row
Primary key not in incoming data but in existing data set = Deleted row
Pull out any changes you need to send to the RowsToSend table.
Send the changes from RowsToSend.
Move the key values and HashValues to your SentRows table. Update hashes for changed key values, insert new rows, and decide how you're going to handle deletes, if you have to deal with deletes.
Truncate the SentRows table to get ready for tomorrow.
If you'd like (and you'll thank yourself later if you do) add a computed column to the SentRows table with default of GETDATE(), which will tell you when the row was added.
And away you go. Nothing but deltas from now on.
Edit 2019-10-31:
Step by step (or TL;DR):
1) Flush and Fill MainTable.
2) Compare keys and hashes on MainTable to keys and hashes on SentRows to identify new/changed rows.
3) Move new/changed rows to RowsToSend.
4) Send the rows that are in RowsToSend.
5) Move all the rows from RowsToSend to SentRows.
6) Truncate RowsToSend.

table with only one column?

I need to create a table with a single column: email.
Can I create a table with a single column and add a clustered index on email? Or should I create and identity column and do a non-clustered on email?
The table will hold around a million email addresses.
Developers will use this table, and I imagine they will just do where xxxx in (select email from table); the way I see it, there is no other way of using this table.
I will run a merge once a week that will insert new emails. Not sure if I should do a merge, if it is uniquely clustered on email. I can just insert and hopefully if a record is duplicated it would not insert it and continue with the rest, right?
This is mostly a personal choice decision. There's some performance improvements to having the identity column as your clustered index key when your code is inserting/updating/deleting.
I would create the identity column as the clustered index and make the email a separate column. It's ideal to have the clustered index key as an ever-increasing value.
What happens if you enter the same email twice? Should that be two separate rows or should that cause an error? These are things to think about when you design this table.

why sql server increment the Identity specification?

I am using sql server 2012, in my database I have set primaykey on userid also I have set the Identity specification Yes,Is Identity Yes,Identity Increment 1 and Identity Seed 1.
I just insert 5 users and userid is 1,2,3,4,5. I am sure after that I haven't did any insert and no other sp or trigger is using this table. this is just a new table. Now when I tried to insert 6th user it has inserted userid is 1001.
and for 7th 1002 and for 8th it inserted 2002 ,
why such jumped in userid?
Usually Gaps occur when:
1. records are deleted.
2. error has occurred when attempting to insert a new record (e.g. not-null constraint error).the identity value is helplessly skipped.
3. somebody has inserted/updated it with explicit value (e.g. identity_insert option).
4. incremental value is more than 1.
The identity property on a column does not guarantee the following:
Uniqueness of the value – Uniqueness must be enforced by using a PRIMARY KEY or UNIQUE constraint or UNIQUE index.
Consecutive values within a transaction – A transaction inserting multiple rows is not guaranteed to get consecutive values for the rows because other concurrent inserts might occur on the table. If values must be consecutive then the transaction should use an exclusive lock on the table or use the SERIALIZABLE isolation level.
Consecutive values after server restart or other failures –SQL Server might cache identity values for performance reasons and some of the assigned values can be lost during a database failure or server restart. This can result in gaps in the identity value upon insert. If gaps are not acceptable then the application should use a sequence generator with the NOCACHE option or use their own mechanism to generate key values.
Reuse of values – For a given identity property with specific seed/increment, the identity values are not reused by the engine. If a particular insert statement fails or if the insert statement is rolled back then the consumed identity values are lost and will not be generated again. This can result in gaps when the subsequent identity values are generated.
Also,
If an identity column exists for a table with frequent deletions, gaps can occur between identity values. If this is a concern, do not use the IDENTITY property. However, to make sure that no gaps have been created or to fill an existing gap, evaluate the existing identity values before explicitly entering one with SET IDENTITY_INSERT ON.
Also, Check the Identity Column Properties & check the Identity Increment value. Its should be 1.
Open your table in design view
Now check that Identity Seed and Identity Increment values are correct. If not then you must correct them.

race condition in logic for database

I have a database server.
The application logic is that it will query to see if a particular row exist, if not, it will insert a new row. The query is done in Java and container managed transactions.
So with 2 application server running the same code, is it possible for both servers to check the row don't exist, and both insert the row. (the insert will be successful due to another unique auto-number primary key column)
how do we ensure there is only one and only one unique row for that data ?
sknaht
the insert will be successful due to another unique auto-number primary key column
Most DBMSes offer a way to create a "unique key" or "unique index" that enforces uniqueness of a given column (or set of columns) even if it's not the primary key. The second insert would then fail, just as if it had violated a primary-key constraint.
You haven't indicated what DBMS you're using, but most (all?) of the common ones have this feature; for example, PostgreSQL, MySQL, SQL Server, and Oracle all do.

Making primary key and identity column after data has been loaded

I have quick question for you SQL gurus. I have existing tables without primary key column and Identity is not set. Now I am trying to modify those tables by making existing integer column as primary key and adding identity values for that column. My question is should I first copy all the records from the table to a temp table before making those changes . Do I loose all the previous records if I ran the T-SQL commnad to make primary key and add identity column on those tables. What are the approaches should I take such as
1) Create temp table to copy all the records from the table to be modified
2) Load all the records to the temptable
3) Make changes on the table schema
4) Finally load the records from the temp table to the original table.
Or
there are better ways that this? I really appreciate your help
Thanks
Tools>Options>Designers>Table and Database Designers
Uncheck "Prevent saving changes that require table re-creation"
[Edit] I've tried this with populated tables and I didn't lose data, but I don't really know much about this.
Hopefully you don't have too many records in the table. What happens if you use Management studio to change an existing field to identity is that it creates another table with the identity field set. it turns identity insert on and inserets the records from the original table, then turns identity insert off. Then it drops the old table and renames the table it just created. This can be quite a lengthy process if you have many records. If so I would script this out and then do it in a job that runs during the off hours because the table will be completely locked while you do this.
just do all of your changes in management studio, copy/paste the generated script into a file. DON'T SAVE CHANGES at this point. Look over and edit that script as necessary, it will probably do almost exactly what you are thinking (it will drop the original table and rename the temp one to the original's name), but handle all constraints and FKs as well.
If your existing integer column is unique and suitable, there should be no problem converting it to a PK.
Another alternative, if you don't want to use the existing column, you can add a new PK columns to the main table, populate it and seed it, then run update statements to update all other tables with new PK.
Whatever way you do it, make sure you do a back-up first!!
You can always add the IDENTITY column after you have finished copying your data around. You can also then reset the IDENTITY seed to the max integer + 1. That should solve your problems.
DBCC CHECKIDENT ('MyTable', RESEED, n)
Where n is the number you want the identity to start at.

Resources