Creating a SQL Server trigger to transition from a natural key to a surrogate key - sql-server

Backstory
At work where we're planning on deprecating a Natural Key column in one of our primary tables. The project consists of 100+ applications that link to this table/column; 400+ stored procedures that reference this column directly; and a vast array of common tables between these applications that also reference this column.
The Big Bang and Start from Scratch methods are out of the picture. We're going to deprecate this column one application at a time, certify the changes, and move on to the next... and we've got a lengthy target goal to make this effort practical.
The problem I have is that a lot of these applications have shared stored procedures and tables. If I completely convert all of Application A's tables/stored procedures Application B and C will be broken until converted. These in turn may break applications D, E, F...Etc. I've already got a strategy implemented for Code classes and Stored Procedures, the part I'm stuck on is the transitioning state of the database.
Here's a basic example of what we have:
Users
---------------------------
Code varchar(32) natural key
Access
---------------------------
UserCode varchar(32) foreign key
AccessLevel int
And we're aiming now just for transitional state like this:
Users
---------------------------
Code varchar(32)
Id int surrogate key
Access
---------------------------
UserCode varchar(32)
UserID int foreign key
AccessLevel int
The idea being during the transitional phase un-migrated applications and stored procedures will still be able to access all the appropriate data and new ones can start pushing to the correct columns -- Once the migration is complete for all stored procedures and applications we can finally drop the extra columns.
I wanted to use SQL Server's triggers to automatically intercept any new Insert/Update's and do something like the following on each of the affected tables:
CREATE TRIGGER tr_Access_Sync
ON Access
INSTEAD OF INSERT(, UPDATE)
AS
BEGIN
DIM #code as Varchar(32)
DIM #id as int
SET #code = (SELECT inserted.code FROM inserted)
SET #id = (SELECT inserted.code FROM inserted)
-- This is a migrated application; find the appropriate legacy key
IF #code IS NULL AND #id IS NOT NULL
SELECT Code FROM Users WHERE Users.id = #id
-- This is a legacy application; find the appropriate surrogate key
IF #id IS NULL AND #code IS NOT NULL
SELECT Code FROM Users WHERE Users.id = #id
-- Impossible code:
UPDATE inserted SET inserted.code=#code, inserted.id=#id
END
Question
The 2 huge problems I'm having so far are:
I can't do an "AFTER INSERT" because NULL constraints will make the insert fail.
The "impossible code" I mentioned is how I'd like to cleanly proxy the original query; If the original query has x, y, z columns in it or just x, I ideally would like the same trigger to do these. And if I add/delete another column, I'd like the trigger to remain functional.
Anyone have a code example where this could be possible, or even an alternate solution for keeping these columns properly filled even when only one of values is passed to SQL?

Tricky business...
OK, first of all: this trigger will NOT work in many circumstances:
SET #code = (SELECT inserted.code FROM inserted)
SET #id = (SELECT inserted.code FROM inserted)
The trigger can be called with a set of rows in the Inserted pseudo-table - which one are you going to pick here?? You need to write your trigger in such a fashion that it will work even when you get 10 rows in the Inserted table. If a SQL statement inserts 10 rows, your trigger will not be fired ten times - one for each row - but only once for the whole batch - you need to take that into account!
Second point: I would try to make the ID's IDENTITY fields - then they'll always get a value - even for "legacy" apps. Those "old" apps should provide a legacy key instead - so you should be fine there. The only issue I see and don't know how you handle those are inserts from an already converted app - do they provide an "old-style" legacy key as well? If not - how quickly do you need to have such a key?
What I'm thinking about would be a "cleanup job" that would run over the table and get all the rows with a NULL legacy key and then provide some meaningful value for it. Make this a regular stored procedure and execute it every e.g. day, four hours, 30 minutes - whatever suits your needs. Then you don't have to deal with triggers and all the limitations they have.

Wouldn't it be possible to make the schema changes 'bigbang' but create views over the top of those tables that 'hide' the change?
I think you might find you are simply putting off the breakages to a later point in time: "We're going to deprecate this column one application at a time" - it might be my naivety but I can't see how that's ever going to work.
Surely, a worse mess can occur when different applications are doing things differently?

After sleeping on the problem, this seems to be the most generic/re-usable solution I could come up with within the SQL Syntax. It works fine even if both columns have a NOT NULL restraint, even if you don't reference the "other" column at all in your insert.
CREATE TRIGGER tr_Access_Sync
ON Access
INSTEAD OF INSERT
AS
BEGIN
/*-- Create a temporary table to modify because "inserted" is read-only */
/*-- "temp" is actually "#temp" but it throws off stackoverflow's syntax highlighting */
SELECT * INTO temp FROM inserted
/*-- If for whatever reason the secondary table has it's own identity column */
/*-- we need to get rid of it from our #temp table to do an Insert later with identities on */
ALTER TABLE temp DROP COLUMN oneToManyIdentity
UPDATE temp
SET
UserCode = ISNULL(UserCode, (SELECT UserCode FROM Users U WHERE U.UserID = temp.UserID)),
UserID = ISNULL(UserID, (SELECT UserID FROM Users U WHERE U.UserCode = temp.UserCode))
INSERT INTO Access SELECT * FROM temp
END

Related

Creating a history table without using triggers

I have a TABLE A with 3000 records with 25 columns. I want to have a history table called Table A history holding all the changes updates and deletes for me to look up any day. I usually use cursors. Now thought using triggers which I was not asked to. Do you have any other suggestions? Many thanks!
If your using tsql /SQL server and you can't use triggers, which is the only sure way to get every change, maybe use a stored procedure that is scheduled in job to run every x amount of time, the stored procedure using a MERGE statement with the two tables to get new records or changes. I would not suggest this if you need every single change without question.
CREATE TABLE dbo.TableA (id INT, Column1 nvarchar(30))
CREATE TABLE dbo.TableA_History (id INT, Column1 nvarchar(30), TimeStamp DateTime)
(this code isn't production, just the general idea)
Put the following code inside a stored procedure and use a Sql Server Job with a schedule on it.
MERGE INTO dbo.TableA_History
USING dbo.TableA
ON TableA_History.id = TableA.id AND TableA_History.Column1 = TableA.Column1
WHEN NOT MATCHED BY TARGET THEN
INSERT (id,Column1,TimeStamp) VALUES (TableA.id,TableA.Column1,GETDATE())
So basically if the record either doesn't exist or doesn't match meaning a column changed, insert the record into the history table.
It is possible to create history without triggers in some case, even if you are not using SQL Server 2016 and system-versioned table are not available.
In some cases, when you can identify for sure which routines are modifying your table, you can create history using OUTPUT INTO clause.
For example,
INSERT INTO [dbo].[MainTable]
OUTPUT inserted.[]
,...
,'I'
,GETUTCDATE()
,#CurrentUserID
INTO [dbo].[HistoryTable]
SELECT *
FROM ... ;
In routines, when you are using MERGE I like that we can use $action:
Is available only for the MERGE statement. Specifies a column of type
nvarchar(10) in the OUTPUT clause in a MERGE statement that returns
one of three values for each row: 'INSERT', 'UPDATE', or 'DELETE',
according to the action that was performed on that row.
It's very handy that we can add the user which is modifying the table. Using triggers you need to use session context or session variable to pass the user. In versioning table you need to add additional column to the main table in order to log the user as it only logs the current table columns (at least for now).
So, basically it depends on your data and application. If you have many sources of CRUD over the table, the trigger is the most secure way. If your table is very big and heavily used, using MERGE is not good as it my cause blocking and harm performance.
In our databases we are using all of the methods depending on the situation:
triggers for legacy
system-versioning for new development
direct OUTPUT in the history, when sure that data is modified only by given set of routines

Setting up a relational DB for event logging

I am a bit rusty with my SQL since I have not worked with it beyond basic querying of existing databases that were already setup.
I am trying to create an event logging database, and want to take a "extreme" approach to normalization. I would have a main table comprised of mostly 'smallint' fields that point to child tables which contain strings.
Example:
I have an external system that i would like to enable some logging in via SQL, user fills in some key parameters which build and insert/update statement and gets pushed to the logging tables so they can be viewed at a later time if they need to know what XYZ value was at runtime, or sometime in the past.
I have a main table which consists of:
SELECT [log_id] - bigint (auto-increment) PK
,[date_time] - smalldatetime
,[cust_id] - smallint FK
,[recloc] - char(8)
,[alert_level] - smallint FK
,[header] - varchar(100)
,[body] - varchar(1000)
,[process_id] - smalint FK
,[routine_id] - smallint FK
,[workflow_id] - smallint FK
FROM [EventLogs].[dbo].[eventLogs]
All of the 'smallint' field point to a child table which contains the expanded data:
Example:
SELECT [routine_id] PK/FK
,[routine_name]
,[description]
FROM [EventLogs].[dbo].[cpRoutine]
SELECT [process_id] PK/FK
,[process_name]
,[description]
FROM [EventLogs].[dbo].[cpProcess]
My goal here, is to have the external system do an update/insert statement that reaches all these tables. I have all the 'smallint' fields linked up as FK's currently.
How do i go about crafting the update/insert statements that touches all these tables? If a child table already contains a key-value pair, i do not want to touch it. The idea of the child tables is to house repetitive data there and assign it a key in the main logging table to keep size down. Do i need to check for existence of a records in child tables, save the index number, then build my insert statement for the main table? Trying to be as efficient as possible here.
Example:
I want to log the following from the external system:
- date_time - GETDATE()
- customer_number - '0123456789'
- recloc - 'ABC123'
- alert_level - 'info'
- header - 'this is a header'
- body - 'this is a body'
- process_name - 'the process'
- routine_name - 'the routine'
- workflow_name - 'the workflow'
Do I need to create my insert statement for the main table (eventLogs) but check each child table first and add missing values, then save the id for my insert statement in the main table?
Select process_id, process_name From cpProcess where process_name = 'the process'
If no values returned, do an insert statement with the process_name
Now query the table again to get the ID so i can build the "main insert statement" that feeds the master log table
Repeat for all other child tables
final insert statement looks something like:
SQL code:
INSERT INTO eventLogs (date_time, cust_id, recloc, alert_level, header, body, process_id, routine_id, workflow_id)
VALUES('2017-12-31', '1', 'ABC123', '3', 'this is a header', 'this is a body', '13', '19', '12')
It just seems like i am doing too much back and forth with the server checking for values in the child tables to do my insert....
The end goal here is to create a friendly view that pulls in all the data assigned to the 'smallint' keys.
You're close:
Select process_id from cpProcess where process_name = 'the process'
If no values returned, do an insert statement with the process_name, get ID through IDENT_CURRENT, SCOPE_IDENTITY, or IDENTITY (or use a subordinate "load" procedure and get the ID from an output parameter).
Repeat for each child table until you get the values required to do your final insert into [eventLogs].
This works fine if it is a relatively low speed process. As you increase the speed you can have issues, but if you are doing INSERT only, as you should, it still isn't terrible. I've used SQL Server Service Broker in the past to decouple processes such as these to improve performance, but that obviously adds complexity.
Depending on the load you might also decide to build aggregate tables in a fact/dimension star so that the INSERT OLTP process is segregated from the SELECT OLAP process.
What you're seeing is the complexity involved in building a normalized data structure. You're approach "to take a "extreme" approach to normalization" is often bypassed because it's "too hard". That doesn't mean you shouldn't do it, but you should weigh the ROI. I have made decisions to just dump everything into a log table such as this below in the past where there were only going to be perhaps less than ten thousand records at any given time. You just have to look at the requirements and make the best choice.
CREATE TABLE [log].[data]
(
[id] INT IDENTITY(1, 1)
, [timestamp] DATETIME DEFAULT sysdatetime()
, [entry] XML NOT NULL
);
One option that I frequently use during the build out phase of a design is to build placeholders behind adapters as shown below. Use the getter and setter methods ALWAYS and later, when you need better performance or data storage, you can refactor the underlying data structure as required, modify the adapters to the new data structures, and you've saved yourself some time. Otherwise you can end up chasing a lot of rabbits down holes early in the project. Often you'll find that your design for the underlying structures changes based on requirements as the project moves forward and you'd have spent a lot of time on changes. Using this approach you get a working mechanism in place immediately.
Later on if you need to collapse this structure to provide better performance it will be trivial compared to constantly changing the structure during design (in my opinion).
Oh, and yes, you could use a standard relational table. I use a lot of XML in applications and event logging because it allows ad hoc structured data. The concept is the same. You could use your top level table, just with the [process_name], etc. columns directly in the table and no child columns for now.
Just remember you should NOT allow access to the underlying tables directly! One way to prevent this is to actually put them in a dedicated schema such as [log_secure], and secure that schema to all but admin and the accessor/mutator methods.
IF schema_id(N'log') IS NULL
EXECUTE (N'CREATE SCHEMA log');
go
IF object_id(N'[log].[data]', N'U') IS NOT NULL
DROP TABLE [log].[data];
go
CREATE TABLE [log].[data]
(
[id] BIGINT IDENTITY(1, 1)
, [timestamp] DATETIMEOFFSET NOT NULL -- DATETIME if timezone isn't needed
CONSTRAINT [log__data__timestamp__df] DEFAULT sysdatetimeoffset()
, [entry] XML NOT NULL,
CONSTRAINT [log__data__id__pk] PRIMARY KEY CLUSTERED ([id])
);
IF object_id(N'[log].[get_entry]', N'P') IS NOT NULL
DROP PROCEDURE [log].[get_entry];
go
CREATE PROCEDURE [log].[get_entry] #id BIGINT
, #entry XML output
, #begin DATETIMEOFFSET
, #end DATETIMEOFFSET
AS
BEGIN
SELECT #entry
FROM [log].[data]
WHERE [id] = #id;
END;
go
IF object_id(N'[log].[set_entry]', N'P') IS NOT NULL
DROP PROCEDURE [log].[set_entry];
go
CREATE PROCEDURE [log].[set_entry] #entry XML
, #timestamp DATETIMEOFFSET = NULL
, #id BIGINT output
AS
BEGIN
INSERT INTO [log].[entry]
([timestamp]
, [entry])
VALUES ( COALESCE(#timestamp, sysdatetimeoffset()),#entry );
SET #id = SCOPE_IDENTITY();
END;
go

How to emulate a BEFORE INSERT trigger in T-SQL / SQL Server for super/subtype (Inheritance) entities? [duplicate]

This question already has answers here:
How can I do a BEFORE UPDATED trigger with sql server?
(9 answers)
Closed 2 years ago.
This is on Azure.
I have a supertype entity and several subtype entities, the latter of which needs to obtain their foreign keys from the primary key of the super type entity on each insert. In Oracle, I use a BEFORE INSERT trigger to accomplish this. How would one accomplish this in SQL Server / T-SQL?
DDL
CREATE TABLE super (
super_id int IDENTITY(1,1)
,subtype_discriminator char(4) CHECK (subtype_discriminator IN ('SUB1', 'SUB2')
,CONSTRAINT super_id_pk PRIMARY KEY (super_id)
);
CREATE TABLE sub1 (
sub_id int IDENTITY(1,1)
,super_id int NOT NULL
,CONSTRAINT sub_id_pk PRIMARY KEY (sub_id)
,CONSTRAINT sub_super_id_fk FOREIGN KEY (super_id) REFERENCES super (super_id)
);
I wish for an insert into sub1 to fire a trigger that actually inserts a value into super and uses the super_id generated to put into sub1.
In Oracle, this would be accomplished by the following:
CREATE TRIGGER sub_trg
BEFORE INSERT ON sub1
FOR EACH ROW
DECLARE
v_super_id int; //Ignore the fact that I could have used super_id_seq.CURRVAL
BEGIN
INSERT INTO super (super_id, subtype_discriminator)
VALUES (super_id_seq.NEXTVAL, 'SUB1')
RETURNING super_id INTO v_super_id;
:NEW.super_id := v_super_id;
END;
Please advise on how I would simulate this in T-SQL, given that T-SQL lacks the BEFORE INSERT capability?
Sometimes a BEFORE trigger can be replaced with an AFTER one, but this doesn't appear to be the case in your situation, for you clearly need to provide a value before the insert takes place. So, for that purpose, the closest functionality would seem to be the INSTEAD OF trigger one, as #marc_s has suggested in his comment.
Note, however, that, as the names of these two trigger types suggest, there's a fundamental difference between a BEFORE trigger and an INSTEAD OF one. While in both cases the trigger is executed at the time when the action determined by the statement that's invoked the trigger hasn't taken place, in case of the INSTEAD OF trigger the action is never supposed to take place at all. The real action that you need to be done must be done by the trigger itself. This is very unlike the BEFORE trigger functionality, where the statement is always due to execute, unless, of course, you explicitly roll it back.
But there's one other issue to address actually. As your Oracle script reveals, the trigger you need to convert uses another feature unsupported by SQL Server, which is that of FOR EACH ROW. There are no per-row triggers in SQL Server either, only per-statement ones. That means that you need to always keep in mind that the inserted data are a row set, not just a single row. That adds more complexity, although that'll probably conclude the list of things you need to account for.
So, it's really two things to solve then:
replace the BEFORE functionality;
replace the FOR EACH ROW functionality.
My attempt at solving these is below:
CREATE TRIGGER sub_trg
ON sub1
INSTEAD OF INSERT
AS
BEGIN
DECLARE #new_super TABLE (
super_id int
);
INSERT INTO super (subtype_discriminator)
OUTPUT INSERTED.super_id INTO #new_super (super_id)
SELECT 'SUB1' FROM INSERTED;
INSERT INTO sub (super_id)
SELECT super_id FROM #new_super;
END;
This is how the above works:
The same number of rows as being inserted into sub1 is first added to super. The generated super_id values are stored in a temporary storage (a table variable called #new_super).
The newly inserted super_ids are now inserted into sub1.
Nothing too difficult really, but the above will only work if you have no other columns in sub1 than those you've specified in your question. If there are other columns, the above trigger will need to be a bit more complex.
The problem is to assign the new super_ids to every inserted row individually. One way to implement the mapping could be like below:
CREATE TRIGGER sub_trg
ON sub1
INSTEAD OF INSERT
AS
BEGIN
DECLARE #new_super TABLE (
rownum int IDENTITY (1, 1),
super_id int
);
INSERT INTO super (subtype_discriminator)
OUTPUT INSERTED.super_id INTO #new_super (super_id)
SELECT 'SUB1' FROM INSERTED;
WITH enumerated AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rownum
FROM inserted
)
INSERT INTO sub1 (super_id, other columns)
SELECT n.super_id, i.other columns
FROM enumerated AS i
INNER JOIN #new_super AS n
ON i.rownum = n.rownum;
END;
As you can see, an IDENTIY(1,1) column is added to #new_user, so the temporarily inserted super_id values will additionally be enumerated starting from 1. To provide the mapping between the new super_ids and the new data rows, the ROW_NUMBER function is used to enumerate the INSERTED rows as well. As a result, every row in the INSERTED set can now be linked to a single super_id and thus complemented to a full data row to be inserted into sub1.
Note that the order in which the new super_ids are inserted may not match the order in which they are assigned. I considered that a no-issue. All the new super rows generated are identical save for the IDs. So, all you need here is just to take one new super_id per new sub1 row.
If, however, the logic of inserting into super is more complex and for some reason you need to remember precisely which new super_id has been generated for which new sub row, you'll probably want to consider the mapping method discussed in this Stack Overflow question:
Using merge..output to get mapping between source.id and target.id
While Andriy's proposal will work well for INSERTs of a small number of records, full table scans will be done on the final join as both 'enumerated' and '#new_super' are not indexed, resulting in poor performance for large inserts.
This can be resolved by specifying a primary key on the #new_super table, as follows:
DECLARE #new_super TABLE (
row_num INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
super_id int
);
This will result in the SQL optimizer scanning through the 'enumerated' table but doing an indexed join on #new_super to get the new key.

INSERT INTO vs SELECT INTO

What is the difference between using
SELECT ... INTO MyTable FROM...
and
INSERT INTO MyTable (...)
SELECT ... FROM ....
?
From BOL [ INSERT, SELECT...INTO ], I know that using SELECT...INTO will create the insertion table on the default file group if it doesn't already exist, and that the logging for this statement depends on the recovery model of the database.
Which statement is preferable?
Are there other performance implications?
What is a good use case for SELECT...INTO over INSERT INTO ...?
Edit: I already stated that I know that that SELECT INTO... creates a table where it doesn't exist. What I want to know is that SQL includes this statement for a reason, what is it? Is it doing something different behind the scenes for inserting rows, or is it just syntactic sugar on top of a CREATE TABLE and INSERT INTO.
They do different things. Use INSERT when the table exists. Use SELECT INTO when it does not.
Yes. INSERT with no table hints is normally logged. SELECT INTO is minimally logged assuming proper trace flags are set.
In my experience SELECT INTO is most commonly used with intermediate data sets, like #temp tables, or to copy out an entire table like for a backup. INSERT INTO is used when you insert into an existing table with a known structure.
EDIT
To address your edit, they do different things. If you are making a table and want to define the structure use CREATE TABLE and INSERT. Example of an issue that can be created: You have a small table with a varchar field. The largest string in your table now is 12 bytes. Your real data set will need up to 200 bytes. If you do SELECT INTO from your small table to make a new one, the later INSERT will fail with a truncation error because your fields are too small.
Which statement is preferable? Depends on what you are doing.
Are there other performance implications? If the table is a permanent table, you can create indexes at the time of table creation which has implications for performance both negatively and positiviely. Select into does not recreate indexes that exist on current tables and thus subsequent use of the table may be slower than it needs to be.
What is a good use case for SELECT...INTO over INSERT INTO ...? Select into is used if you may not know the table structure in advance. It is faster to write than create table and an insert statement, so it is used to speed up develoment at times. It is often faster to use when you are creating a quick temp table to test things or a backup table of a specific query (maybe records you are going to delete). It should be rare to see it used in production code that will run multiple times (except for temp tables) because it will fail if the table was already in existence.
It is sometimes used inappropriately by people who don't know what they are doing. And they can cause havoc in the db as a result. I strongly feel it is inappropriate to use SELECT INTO for anything other than a throwaway table (a temporary backup, a temp table that will go away at the end of the stored proc ,etc.). Permanent tables need real thought as to their design and SELECT INTO makes it easy to avoid thinking about anything even as basic as what columns and what datatypes.
In general, I prefer the use of the create table and insert statement - you have more controls and it is better for repeatable processes. Further, if the table is a permanent table, it should be created from a separate create table script (one that is in source control) as creating permanent objects should not, in general, in code are inserts/deletes/updates or selects from a table. Object changes should be handled separately from data changes because objects have implications beyond the needs of a specific insert/update/select/delete. You need to consider the best data types, think about FK constraints, PKs and other constraints, consider auditing requirements, think about indexing, etc.
Each statement has a distinct use case. They are not interchangeable.
SELECT...INTO MyTable... creates a new MyTable where one did not exist before.
INSERT INTO MyTable...SELECT... is used when MyTable already exists.
The primary difference is that SELECT INTO MyTable will create a new table called MyTable with the results, while INSERT INTO requires that MyTable already exists.
You would use SELECT INTO only in the case where the table didn't exist and you wanted to create it based on the results of your query. As such, these two statements really are not comparable. They do very different things.
In general, SELECT INTO is used more often for one off tasks, while INSERT INTO is used regularly to add rows to tables.
EDIT:
While you can use CREATE TABLE and INSERT INTO to accomplish what SELECT INTO does, with SELECT INTO you do not have to know the table definition beforehand. SELECT INTO is probably included in SQL because it makes tasks like ad hoc reporting or copying tables much easier.
Actually SELECT ... INTO not only creates the table but will fail if it already exists, so basically the only time you would use it is when the table you are inserting to does not exists.
In regards to your EDIT:
I personally mainly use SELECT ... INTO when I am creating a temp table. That to me is the main use. However I also use it when creating new tables with many columns with similar structures to other tables and then edit it in order to save time.
I only want to cover second point of the question that is related to performance, because no body else has covered this. Select Into is a lot more faster than insert into, when it comes to tables with large datasets. I prefer select into when I have to read a very large table. insert into for a table with 10 million rows may take hours while select into will do this in minutes, and as for as losing indexes on new table is concerned you can recreate the indexes by query and can still save a lot more time when compared to insert into.
SELECT INTO is typically used to generate temp tables or to copy another table (data and/or structure).
In day to day code you use INSERT because your tables should already exist to be read, UPDATEd, DELETEd, JOINed etc. Note: the INTO keyword is optional with INSERT
That is, applications won't normally create and drop tables as part of normal operations unless it is a temporary table for some scope limited and specific usage.
A table created by SELECT INTO will have no keys or indexes or constraints unlike a real, persisted, already existing table
The 2 aren't directly comparable because they have almost no overlap in usage
Select into creates new table for you at the time and then insert records in it from the source table. The newly created table has the same structure as of the source table.If you try to use select into for a existing table it will produce a error, because it will try to create new table with the same name.
Insert into requires the table to be exist in your database before you insert rows in it.
The simple difference between select Into and Insert Into is:
--> Select Into don't need existing table. If you want to copy table A data, you just type Select * INTO [tablename] from A. Here, tablename can be existing table or new table will be created which has same structure like table A.
--> Insert Into do need existing table.INSERT INTO [tablename] SELECT * FROM A;.
Here tablename is an existing table.
Select Into is usually more popular to copy data especially backup data.
You can use as per your requirement, it is totally developer choice which should be used in his scenario.
Performance wise Insert INTO is fast.
References :
https://www.w3schools.com/sql/sql_insert_into_select.asp
https://www.w3schools.com/sql/sql_select_into.asp
The other answers are all great/correct (the main difference is whether the DestTable exists already (INSERT), or doesn't exist yet (SELECT ... INTO))
You may prefer to use INSERT (instead of SELECT ... INTO), if you want to be able to COUNT(*) the rows that have been inserted so far.
Using SELECT COUNT(*) ... WITH NOLOCK is a simple/crude technique that may help you check the "progress" of the INSERT; helpful if it's a long-running insert, as seen in this answer).
[If you use...]
INSERT DestTable SELECT ... FROM SrcTable
...then your SELECT COUNT(*) from DestTable WITH (NOLOCK) query would work.
Select into for large datasets may be good only for a single user using one single connection to the database doing a bulk operation task. I do not recommend to use
SELECT * INTO table
as this creates one big transaction and creates schema lock to create the object, preventing other users to create object or access system objects until the SELECT INTO operation completes.
As proof of concept open 2 sessions, in first session try to use
select into temp table from a huge table
and in the second section try to
create a temp table
and check the locks, blocking and the duration of second session to create a temp table object. My recommendation it is always a good practice to create and Insert statement and if needed for minimal logging use trace flag 610.

SQL server trigger question

I am by no means a sql programmer and I am trying to accomplish something that I am pretty sure has been done a million times before.
I am trying to auto generate a customer number in sql every time a new customer is inserted, but the trigger (or sp?) will only work if at least the first name, last name and another value called case number is entered. If any of these fields are missing, the system generates an error. If the criteria is met, the system generates and assigns a unique id to that customer that begins with letters GL- and then uses 5 digit number so a customer John Doe would be GL-00001 and Jane Doe would be GL-00002.
I am sorry if I am asking too much but I am basically a select insert update guy and nothing more so thanks in advance for any help.
If I were in this situation, I would:
--Alter the table(s) so that first name, last name and case number are required (NOT NULL) columns. Handle your checks for required fields on the application side before submitting the record to the database.
--If it doesn't already exist, add an identity column to the customer table.
--Add a persisted computed column to the customer table that will format the identity column into the desired GL-00000 format.
/* Demo computed column for customer number */
create table #test (
id int identity,
customer_number as 'GL-' + left('00000', 5-len(cast(id as varchar(5)))) + cast(id as varchar(5)) persisted,
name char(20)
)
insert into #test (name) values ('Joe')
insert into #test (name) values ('BobbyS')
select * from #test
drop table #test
This should satisfy your requirements without the need to introduce the overhead of a trigger.
So what do you want to do? generate a customer number even when these fields arn't populated?
Have you looked at the SQL for the trigger? You can do this in SSMS (SQL Server Managment Studio) by going to the table in question in the Object Explorer, expanding the table and then expanding triggers.
If you open up the trigger you'll see what it does to generate the customer number. If you are unsure on how this code works, then post the code for the trigger up.
If you are making changes to an existing system i'd advise you to find out any implications that changing the way data is inputted works.
For example, others parts of the application may depend on all of the initial values being populated, so after changing the trigger to allow incomplete data to be added, you may inturn break something else.
You have probably a unique constraint and/or NOT NULL constraints set on the table.
Remove/Disable these (for example with the SQL-Server Management Console in Design Mode) and then try again to insert the data. Keep in mind, that you will probably not be able to enable the constraints after your insert, since you are violating conditions after the insert. Only disable or reomve the constraints, if you are absolutely sure that they are unecessary.
Here's example syntax (you need to know the constraint names):
--disable
ALTER TABLE customer NOCHECK CONSTRAINT your_constraint_name
--enable
ALTER TABLE customer CHECK CONSTRAINT your_constraint_name
Caution: If I were you, I'd rather try to insert dummy values for the not null columns like this:
insert into customers select afield , 1 as dummyvalue, 2 as dummyvalue from your datasource
A very easy way to do this would be to create a table of this sort of structure:
CustomerID of type in that is a primary key and set it as identity
CustomerIDPrfix of type varchar(3) which stores GL- as a default value.
Then add your other fields and set them to NOT NULL.
If that way is not acceptable and you do need to write a trigger check out these two articles:
http://msdn.microsoft.com/en-us/library/aa258254(SQL.80).aspx
http://www.kodyaz.com/articles/sql-trigger-example-in-sql-server-2008.aspx
Basiclly it is all about getting the logic right to check if the fields are blank. Experiment with a test database on your local machine. This will help you get it right.

Resources