INSERT INTO vs SELECT INTO - sql-server

What is the difference between using
SELECT ... INTO MyTable FROM...
and
INSERT INTO MyTable (...)
SELECT ... FROM ....
?
From BOL [ INSERT, SELECT...INTO ], I know that using SELECT...INTO will create the insertion table on the default file group if it doesn't already exist, and that the logging for this statement depends on the recovery model of the database.
Which statement is preferable?
Are there other performance implications?
What is a good use case for SELECT...INTO over INSERT INTO ...?
Edit: I already stated that I know that that SELECT INTO... creates a table where it doesn't exist. What I want to know is that SQL includes this statement for a reason, what is it? Is it doing something different behind the scenes for inserting rows, or is it just syntactic sugar on top of a CREATE TABLE and INSERT INTO.

They do different things. Use INSERT when the table exists. Use SELECT INTO when it does not.
Yes. INSERT with no table hints is normally logged. SELECT INTO is minimally logged assuming proper trace flags are set.
In my experience SELECT INTO is most commonly used with intermediate data sets, like #temp tables, or to copy out an entire table like for a backup. INSERT INTO is used when you insert into an existing table with a known structure.
EDIT
To address your edit, they do different things. If you are making a table and want to define the structure use CREATE TABLE and INSERT. Example of an issue that can be created: You have a small table with a varchar field. The largest string in your table now is 12 bytes. Your real data set will need up to 200 bytes. If you do SELECT INTO from your small table to make a new one, the later INSERT will fail with a truncation error because your fields are too small.

Which statement is preferable? Depends on what you are doing.
Are there other performance implications? If the table is a permanent table, you can create indexes at the time of table creation which has implications for performance both negatively and positiviely. Select into does not recreate indexes that exist on current tables and thus subsequent use of the table may be slower than it needs to be.
What is a good use case for SELECT...INTO over INSERT INTO ...? Select into is used if you may not know the table structure in advance. It is faster to write than create table and an insert statement, so it is used to speed up develoment at times. It is often faster to use when you are creating a quick temp table to test things or a backup table of a specific query (maybe records you are going to delete). It should be rare to see it used in production code that will run multiple times (except for temp tables) because it will fail if the table was already in existence.
It is sometimes used inappropriately by people who don't know what they are doing. And they can cause havoc in the db as a result. I strongly feel it is inappropriate to use SELECT INTO for anything other than a throwaway table (a temporary backup, a temp table that will go away at the end of the stored proc ,etc.). Permanent tables need real thought as to their design and SELECT INTO makes it easy to avoid thinking about anything even as basic as what columns and what datatypes.
In general, I prefer the use of the create table and insert statement - you have more controls and it is better for repeatable processes. Further, if the table is a permanent table, it should be created from a separate create table script (one that is in source control) as creating permanent objects should not, in general, in code are inserts/deletes/updates or selects from a table. Object changes should be handled separately from data changes because objects have implications beyond the needs of a specific insert/update/select/delete. You need to consider the best data types, think about FK constraints, PKs and other constraints, consider auditing requirements, think about indexing, etc.

Each statement has a distinct use case. They are not interchangeable.
SELECT...INTO MyTable... creates a new MyTable where one did not exist before.
INSERT INTO MyTable...SELECT... is used when MyTable already exists.

The primary difference is that SELECT INTO MyTable will create a new table called MyTable with the results, while INSERT INTO requires that MyTable already exists.
You would use SELECT INTO only in the case where the table didn't exist and you wanted to create it based on the results of your query. As such, these two statements really are not comparable. They do very different things.
In general, SELECT INTO is used more often for one off tasks, while INSERT INTO is used regularly to add rows to tables.
EDIT:
While you can use CREATE TABLE and INSERT INTO to accomplish what SELECT INTO does, with SELECT INTO you do not have to know the table definition beforehand. SELECT INTO is probably included in SQL because it makes tasks like ad hoc reporting or copying tables much easier.

Actually SELECT ... INTO not only creates the table but will fail if it already exists, so basically the only time you would use it is when the table you are inserting to does not exists.
In regards to your EDIT:
I personally mainly use SELECT ... INTO when I am creating a temp table. That to me is the main use. However I also use it when creating new tables with many columns with similar structures to other tables and then edit it in order to save time.

I only want to cover second point of the question that is related to performance, because no body else has covered this. Select Into is a lot more faster than insert into, when it comes to tables with large datasets. I prefer select into when I have to read a very large table. insert into for a table with 10 million rows may take hours while select into will do this in minutes, and as for as losing indexes on new table is concerned you can recreate the indexes by query and can still save a lot more time when compared to insert into.

SELECT INTO is typically used to generate temp tables or to copy another table (data and/or structure).
In day to day code you use INSERT because your tables should already exist to be read, UPDATEd, DELETEd, JOINed etc. Note: the INTO keyword is optional with INSERT
That is, applications won't normally create and drop tables as part of normal operations unless it is a temporary table for some scope limited and specific usage.
A table created by SELECT INTO will have no keys or indexes or constraints unlike a real, persisted, already existing table
The 2 aren't directly comparable because they have almost no overlap in usage

Select into creates new table for you at the time and then insert records in it from the source table. The newly created table has the same structure as of the source table.If you try to use select into for a existing table it will produce a error, because it will try to create new table with the same name.
Insert into requires the table to be exist in your database before you insert rows in it.

The simple difference between select Into and Insert Into is:
--> Select Into don't need existing table. If you want to copy table A data, you just type Select * INTO [tablename] from A. Here, tablename can be existing table or new table will be created which has same structure like table A.
--> Insert Into do need existing table.INSERT INTO [tablename] SELECT * FROM A;.
Here tablename is an existing table.
Select Into is usually more popular to copy data especially backup data.
You can use as per your requirement, it is totally developer choice which should be used in his scenario.
Performance wise Insert INTO is fast.
References :
https://www.w3schools.com/sql/sql_insert_into_select.asp
https://www.w3schools.com/sql/sql_select_into.asp

The other answers are all great/correct (the main difference is whether the DestTable exists already (INSERT), or doesn't exist yet (SELECT ... INTO))
You may prefer to use INSERT (instead of SELECT ... INTO), if you want to be able to COUNT(*) the rows that have been inserted so far.
Using SELECT COUNT(*) ... WITH NOLOCK is a simple/crude technique that may help you check the "progress" of the INSERT; helpful if it's a long-running insert, as seen in this answer).
[If you use...]
INSERT DestTable SELECT ... FROM SrcTable
...then your SELECT COUNT(*) from DestTable WITH (NOLOCK) query would work.

Select into for large datasets may be good only for a single user using one single connection to the database doing a bulk operation task. I do not recommend to use
SELECT * INTO table
as this creates one big transaction and creates schema lock to create the object, preventing other users to create object or access system objects until the SELECT INTO operation completes.
As proof of concept open 2 sessions, in first session try to use
select into temp table from a huge table
and in the second section try to
create a temp table
and check the locks, blocking and the duration of second session to create a temp table object. My recommendation it is always a good practice to create and Insert statement and if needed for minimal logging use trace flag 610.

Related

Creating a history table without using triggers

I have a TABLE A with 3000 records with 25 columns. I want to have a history table called Table A history holding all the changes updates and deletes for me to look up any day. I usually use cursors. Now thought using triggers which I was not asked to. Do you have any other suggestions? Many thanks!
If your using tsql /SQL server and you can't use triggers, which is the only sure way to get every change, maybe use a stored procedure that is scheduled in job to run every x amount of time, the stored procedure using a MERGE statement with the two tables to get new records or changes. I would not suggest this if you need every single change without question.
CREATE TABLE dbo.TableA (id INT, Column1 nvarchar(30))
CREATE TABLE dbo.TableA_History (id INT, Column1 nvarchar(30), TimeStamp DateTime)
(this code isn't production, just the general idea)
Put the following code inside a stored procedure and use a Sql Server Job with a schedule on it.
MERGE INTO dbo.TableA_History
USING dbo.TableA
ON TableA_History.id = TableA.id AND TableA_History.Column1 = TableA.Column1
WHEN NOT MATCHED BY TARGET THEN
INSERT (id,Column1,TimeStamp) VALUES (TableA.id,TableA.Column1,GETDATE())
So basically if the record either doesn't exist or doesn't match meaning a column changed, insert the record into the history table.
It is possible to create history without triggers in some case, even if you are not using SQL Server 2016 and system-versioned table are not available.
In some cases, when you can identify for sure which routines are modifying your table, you can create history using OUTPUT INTO clause.
For example,
INSERT INTO [dbo].[MainTable]
OUTPUT inserted.[]
,...
,'I'
,GETUTCDATE()
,#CurrentUserID
INTO [dbo].[HistoryTable]
SELECT *
FROM ... ;
In routines, when you are using MERGE I like that we can use $action:
Is available only for the MERGE statement. Specifies a column of type
nvarchar(10) in the OUTPUT clause in a MERGE statement that returns
one of three values for each row: 'INSERT', 'UPDATE', or 'DELETE',
according to the action that was performed on that row.
It's very handy that we can add the user which is modifying the table. Using triggers you need to use session context or session variable to pass the user. In versioning table you need to add additional column to the main table in order to log the user as it only logs the current table columns (at least for now).
So, basically it depends on your data and application. If you have many sources of CRUD over the table, the trigger is the most secure way. If your table is very big and heavily used, using MERGE is not good as it my cause blocking and harm performance.
In our databases we are using all of the methods depending on the situation:
triggers for legacy
system-versioning for new development
direct OUTPUT in the history, when sure that data is modified only by given set of routines

Is it bad to use ALTER TABLE to resize a varchar column to a larger size?

I need a simple resize of a column from VARCHAR(36) to VARCHAR(40).
If you try to use SQL Server Enterprise Manager, the script it generates is effectively creating a new table with the new structure, inserting all of the data from the existing table into it, dropping the existing table, renaming the new table, and recreating any indexes.
If you read the documentation (and many online resources including SO), you can use an ALTER statement for the resize.
Does the ALTER affect the way the data is stored in any way? Indexes? Statistics? I want to avoid performance hits because of this modification due to the fact that the table can get large.
Just use ALTER TABLE. SSMS is a bit, er, stupid sometimes
You'll need to drop and recreate dependent constraints (FK, unique, index, check etc)
However, this is only a metadata change and will be very quick for any size table (unless you also change NOT NULL to NULL or varchar to nvarchar or such)
No, ALTER TABLE (http://msdn.microsoft.com/de-de/library/ms190273.aspx) is the way how Microsoft intended to do this kind of change.
And if you do not add extra options to your command, no indexes or statistics should get harmed.
A possibility of data loss is also not given, because you are just making the column bigger.
Everything should be fine.
Changes to database structure should NEVER be made using SSMS on a porduction environment for just the reason you brought up. It can destroy performance in a large table. ALTER table is the prefered method, it is faster and it can be stored in source control as a change to push to prod after testing.
Following should be the better way to handle this
IF EXISTS (SELECT 1
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = '<tablename>'
AND COLUMN_NAME = '<field>')
BEGIN
ALTER TABLE <tablename> ALTER COLUMN [<field>] varchar(xxxx) null
END
ELSE

Is there something like a "column symlink" in Oracle?

I would like to have a column in my DB accessible via two column names temporarily.
Why? The column name was badly chosen, I would like to refactor it. As I want my webapp to remain stable while changing the column name, it would be good to
have a (let's call it) symlink named better_column_name pointing to the column bad_column_name
change the webapplication to use better_column_name
drop the symlink and rename column to better_column_name
"Refactoring Databases" suggests to actually add a second column which is synchronized on commit in order to achieve this. I am just hoping that there might be an easier way with Oracle, with less work and less overhead.
As long as you have code that uses both column names, I don't see a way to get around the fact that you'll have two (real) columns in that table.
I would add the new column with the correct name and then create a trigger that checks which column has been modified and updates the "other" column correspondingly. So whatever is being updated, the value is synch'ed with the other column.
Once all the code that uses the old column has been migrated, remove the trigger and drop the old column.
Edit
The trigger would so do something like this:
CREATE OR REPLACE TRIGGER ...
...
UPDATE OF bad_column_name, better_column_name ON the_table
...
BEGIN
IF UPDATING ('BAD_COLUMN_NAME') THEN
:new.better_column_name = :new.bad_column_name
END IF;
IF UPDATING ('BETTER_COLUMN_NAME') THEN
:new.bad_column_name = :new.better_column_name
END IF;
END;
The order of the IF statements controls which change has a "higher priority" in case someone updated both columns at the same time.
Rename the table:
alter table mytable rename to mytable_old;
Create a view with the original tablename with both bad_column_name and better_column_name that point to the same column (and of course all the other columns):
create or replace view mytable as
select column1
, column2
, ...
, bad_column_name
, bad_column_name better_column_name
from mytable_old
;
Since this view is updatable by default (I assume here that mytable has a primary key), you can insert/update/delete from the view and it doesn't matter if you use bad_column_name or better_column_name.
After the refactoring, drop the view and rename the table and column:
drop view mytable;
alter table mytable_old rename column bad_column_name to better_column_name;
alter table mytable_old rename to mytable;
The best solution to this is only available in Oracle 11g Release 2: Edition-based Redefinition. This really cool feature allows us to maintain different versions of database tables and PL/SQL code, using special triggers and views. Find out more.
Essentially this is Oracle's built-in implementation of #AHorseWithNoName's suggestion.
you can create a view for the table. And port your application to use that view instead of the table.
create table t (bad_name varchar2(10), c2 varchar2(10));
create view vt as select bad_name AS good_name, c2 from t;
insert into vt (good_name, c2) values ('blub', 'blob');
select * from t;
select * from vt;
If you're on 11g you could look at using a virtual column. I'd probably be tempted to change the order slightly; rename the real column and create the virtual one using the old (bad) name, which can then be dropped at leisure. You may be restricted, of course, and there may be implications on other objects being invalidated that make this order less suitable for you.

Creating a SQL Server trigger to transition from a natural key to a surrogate key

Backstory
At work where we're planning on deprecating a Natural Key column in one of our primary tables. The project consists of 100+ applications that link to this table/column; 400+ stored procedures that reference this column directly; and a vast array of common tables between these applications that also reference this column.
The Big Bang and Start from Scratch methods are out of the picture. We're going to deprecate this column one application at a time, certify the changes, and move on to the next... and we've got a lengthy target goal to make this effort practical.
The problem I have is that a lot of these applications have shared stored procedures and tables. If I completely convert all of Application A's tables/stored procedures Application B and C will be broken until converted. These in turn may break applications D, E, F...Etc. I've already got a strategy implemented for Code classes and Stored Procedures, the part I'm stuck on is the transitioning state of the database.
Here's a basic example of what we have:
Users
---------------------------
Code varchar(32) natural key
Access
---------------------------
UserCode varchar(32) foreign key
AccessLevel int
And we're aiming now just for transitional state like this:
Users
---------------------------
Code varchar(32)
Id int surrogate key
Access
---------------------------
UserCode varchar(32)
UserID int foreign key
AccessLevel int
The idea being during the transitional phase un-migrated applications and stored procedures will still be able to access all the appropriate data and new ones can start pushing to the correct columns -- Once the migration is complete for all stored procedures and applications we can finally drop the extra columns.
I wanted to use SQL Server's triggers to automatically intercept any new Insert/Update's and do something like the following on each of the affected tables:
CREATE TRIGGER tr_Access_Sync
ON Access
INSTEAD OF INSERT(, UPDATE)
AS
BEGIN
DIM #code as Varchar(32)
DIM #id as int
SET #code = (SELECT inserted.code FROM inserted)
SET #id = (SELECT inserted.code FROM inserted)
-- This is a migrated application; find the appropriate legacy key
IF #code IS NULL AND #id IS NOT NULL
SELECT Code FROM Users WHERE Users.id = #id
-- This is a legacy application; find the appropriate surrogate key
IF #id IS NULL AND #code IS NOT NULL
SELECT Code FROM Users WHERE Users.id = #id
-- Impossible code:
UPDATE inserted SET inserted.code=#code, inserted.id=#id
END
Question
The 2 huge problems I'm having so far are:
I can't do an "AFTER INSERT" because NULL constraints will make the insert fail.
The "impossible code" I mentioned is how I'd like to cleanly proxy the original query; If the original query has x, y, z columns in it or just x, I ideally would like the same trigger to do these. And if I add/delete another column, I'd like the trigger to remain functional.
Anyone have a code example where this could be possible, or even an alternate solution for keeping these columns properly filled even when only one of values is passed to SQL?
Tricky business...
OK, first of all: this trigger will NOT work in many circumstances:
SET #code = (SELECT inserted.code FROM inserted)
SET #id = (SELECT inserted.code FROM inserted)
The trigger can be called with a set of rows in the Inserted pseudo-table - which one are you going to pick here?? You need to write your trigger in such a fashion that it will work even when you get 10 rows in the Inserted table. If a SQL statement inserts 10 rows, your trigger will not be fired ten times - one for each row - but only once for the whole batch - you need to take that into account!
Second point: I would try to make the ID's IDENTITY fields - then they'll always get a value - even for "legacy" apps. Those "old" apps should provide a legacy key instead - so you should be fine there. The only issue I see and don't know how you handle those are inserts from an already converted app - do they provide an "old-style" legacy key as well? If not - how quickly do you need to have such a key?
What I'm thinking about would be a "cleanup job" that would run over the table and get all the rows with a NULL legacy key and then provide some meaningful value for it. Make this a regular stored procedure and execute it every e.g. day, four hours, 30 minutes - whatever suits your needs. Then you don't have to deal with triggers and all the limitations they have.
Wouldn't it be possible to make the schema changes 'bigbang' but create views over the top of those tables that 'hide' the change?
I think you might find you are simply putting off the breakages to a later point in time: "We're going to deprecate this column one application at a time" - it might be my naivety but I can't see how that's ever going to work.
Surely, a worse mess can occur when different applications are doing things differently?
After sleeping on the problem, this seems to be the most generic/re-usable solution I could come up with within the SQL Syntax. It works fine even if both columns have a NOT NULL restraint, even if you don't reference the "other" column at all in your insert.
CREATE TRIGGER tr_Access_Sync
ON Access
INSTEAD OF INSERT
AS
BEGIN
/*-- Create a temporary table to modify because "inserted" is read-only */
/*-- "temp" is actually "#temp" but it throws off stackoverflow's syntax highlighting */
SELECT * INTO temp FROM inserted
/*-- If for whatever reason the secondary table has it's own identity column */
/*-- we need to get rid of it from our #temp table to do an Insert later with identities on */
ALTER TABLE temp DROP COLUMN oneToManyIdentity
UPDATE temp
SET
UserCode = ISNULL(UserCode, (SELECT UserCode FROM Users U WHERE U.UserID = temp.UserID)),
UserID = ISNULL(UserID, (SELECT UserID FROM Users U WHERE U.UserCode = temp.UserCode))
INSERT INTO Access SELECT * FROM temp
END

In Oracle, is it possible to "insert" a column into a table?

When adding a column to an existing table, Oracle always puts the column at the end of the table. Is it possible to tell Oracle where it should appear in the table? If so, how?
The location of the column in the table should be unimportant (unless there are "page sizes" to consider, or whatever Oracle uses to actually store the data). What is more important to the consumer is how the results are called, i.e. the Select statement.
rename YOUR_ORIGINAL_TABLE as YOUR_NEW_TABLE;
create table YOUR_ORIGINAL_TABLE nologging /* or unrecoverable */
as
select Column1, Column2, NEW_COLUMN, Column3
from YOUR_NEW_TABLE;
Drop table YOUR_NEW_TABLE;
Select * From YOUR_ORIGINAL_TABLE; <<<<< now you will see the new column in the middle of the table.
But why would you want to do it? It's seems illogical. You should never assume column ordering and just use named column list if column order is important.
Why does the order of the columns matter? You can always alter it in your select statement?
There's an advantage to adding new columns at the end of the table. If there's code that naively does a "SELECT *" and then parses the fields in order, you won't be breaking old code by adding new columns at the end. If you add new columns in the middle of the table, then old code may be broken.
At one job, I had a DBA who was super-anal about "Never do 'SELECT *'". He insisted that you always write out the specific fields.
What I normally do is:
Rename the old table.
Create the new table with columns in the right order.
Create the constraints for that new table.
Populate with data:Insert into new_table select * from renamed table.
I don't think that this can be done without saving the data to a temporary table, dropping the table, and recreating it. On the other hand, it really shouldn't matter where the column is. As long as you specify the columns you are retrieving in your select statement, you can order them however you want.
Bear in mind that, under the tables, all the data in the table records are glued together. Adding a column to the end of a table [if it is nullable or (in later versions) not null with a default] just means a change to the table's metadata.
Adding a column in the middle would require re-writing every record in that table to add the appropriate value (or markers) for that column. In some cases, that might mean the records take up more room on the blocks and some records need to be migrated.
In short, it's a VAST amount of IO effort for a table of any real size.
You can always create a view over the table that has the columns in the preferred order and use that view in a DML statement just as you would the table
I don't believe so - SQL Server doesn't allow these either. The method I always have to use is:
Create new table that looks right (including additional column
Begin transaction
select all data from old table into new one
Drop old table
Rename new table
Commit transaction.
Not exactly pretty, but gets the job done.
No, its not possible via an "ALTER TABLE" statement. However, you could create a new table with the same definition as your current one, albeit with a different name, with the columns in the correct order in the way you want them. Copy the data into the new table. Drop the old table. Rename the new table to match the old table name.
Tom Kyte has an article on this on AskTom
link text
Apparently there's a trick involving marking the "after" columns INVISIBLE; when restored, they end up at the back.
CREATE TABLE yourtable (one NUMBER(5, 0), two NUMBER(5, 0), three NUMBER(5, 0), four NUMBER(5, 0))
ALTER TABLE yourtable ADD twopointfive NUMBER(5, 0);
ALTER TABLE yourtable MODIFY (three INVISIBLE, four INVISIBLE);
ALTER TABLE yourtable MODIFY (three VISIBLE, four VISIBLE);
https://oracle-base.com/articles/12c/invisible-columns-12cr1#invisible-columns-and-column-ordering
1) Ok so you can't do it directly. We don't need post after post saying the same thing, do we?
2) Ok so the order of columns in a table doesn't technically matter. But that's not the point, the original question simply asked if you could or couldn't be done. Don't presume that you know everybody else's requirements. Maybe they have a table with 100 columns that is currently being queried using "SELECT * ..." inside some monstrously hacked together query that they would just prefer not to try to untangle, let alone replace "*" with 100 column names. Or maybe they are just anal about the order of things and like to have related fields next to each other when browsing schema with, say SQL Developer. Maybe they are dealing with non-technical staff that won't know to look at the end of a list of 100 columns when, logically, it should be somewhere near the beginning.
Nothing is more irritating than asking an honest question and getting an answer that says: "you shouldn't be doing that". It's MY job, not YOURS! Please don't tell me how to do my job. Just help if you can. Thanks!
Ok... sorry for the rant. Now...at www.orafaq.com it suggests this workaround.
First suppose you have already run:
CREATE TABLE tab1 ( col1 NUMBER );
Now say you want to add a column named "col2", but you want them ordered "col2", "col1" when doing a "SELECT * FROM tbl1;"
The suggestion is to run:
ALTER TABLE tab1 ADD (col2 DATE);
RENAME tab1 TO tab1_old;
CREATE TABLE tab1 AS SELECT 0 AS col1, col1 AS col2 FROM tab1_old;
I found this to be incredibly misleading. First of all, you're filling "col1" with zero's so, if you had any data, then you are losing it by doing this. Secondly, it's actually renaming "col1" to "col2" and fails to mention this. So, here's my example, hopefully it's a little clearer:
Suppose you have a table that was created with the following statement:
CREATE TABLE users (first_name varchar(25), last_name varchar(25));
Now say you want to insert middle_name in between first_name and last_name. Here's one way:
ALTER TABLE users ADD middle_name varchar(25);
RENAME users TO users_tmp;
CREATE TABLE users AS SELECT first_name, middle_name, last_name FROM users_tmp;
/* and for good measure... */
DROP TABLE testusers_tmp;
Note that middle_name will default to NULL (implied by the ALTER TABLE statement). You can alternatively set a different default value in the CREATE TABLE statement like so:
CREATE TABLE users AS SELECT first_name, 'some default value' AS middle_name, last_name FROM users_tmp;
This trick could come in handy if you're adding a date field with a default of sysdate, but you want all of the existing records to have some other (e.g. earlier) date value.

Resources