How to Bulk Insert from a select query? - sql-server

I want to Bulk Insert the result of this query into a table. What should I do?
Here is my query :
select *
from table1 t1
inner join table2 t2 on t1.code = t2.stucode

bulk insert is not the term you are looking for. bulk insert imports a data file. You are not importing a data file.
You are looking to create a table based on the results of a query. Which is easily done with select ... into.
select *
into t3
from table1 t1
inner join table2 t2 on t1.code=t2.stucode
Alternatively, you can create the table first, and use the minimally logged bulk load operation of insert ... select with trace flag 610 for clustered indexes, or no trace flag required for heaps. Find more about the other limitations to this here:
The Data Loading Performance Guide

Related

Does MS SQL allow to use multiple queries for different columns in INSERT statement?

Is it possible to insert values using multiple queries for different columns?
MS SQL allows to write,
INSERT INTO TABLE1(COL1)
SELECT COL1 FROM TABLE2
But what if I want to insert one data from one table, and another data from another table? Does MS SQL allow it? If so, what is the syntax?
You can write any select query like :
INSERT INTO TABLE1 (COLUMN1, COLUMN2)
SELECT TABLE2.VAL1, TABLE3.VAL2 FROM TABLE2 INNER JOIN TABLE3 ON TABLE2.ID = TABLE3.FID

TSQL - subquery inside Begin End

Consider the following query:
begin
;with
t1 as (
select top(10) x from tableX
),
t2 as (
select * from t1
),
t3 as (
select * from t1
)
-- --------------------------
select *
from t2
join t3 on t3.x=t2.x
end
go
I was wondering if t1 is called twice hence tableX being called twice (which means t1 acts like a table)?
or just once with its rows saved in t1 for the whole query (like a variable in a programming lang)?
Just trying to figure out how tsql engine optimises this. This is important to know because if t1 has millions of rows and is being called many times in the whole query generating the same result then there should be a better way to do it..
Just create the table:
CREATE TABLE tableX
(
x int PRIMARY KEY
);
INSERT INTO tableX
VALUES (1)
,(2)
Turn on the execution plan generation and execute the query. You will get something like this:
So, yes, the table is queried two times. If you are using complex common table expression and you are working with huge amount of data, I will advice to store the result in temporary table.
Sometimes, I am getting very bad execution plans for complex CTEs which were working nicely in the past. Also, you are allowed to define indexes on temporary tables and improve performance further.
To be honest, there is no answer... The only answer is Race your horses (Eric Lippert).
The way you write your query does not tell you, how the engine will put it in execution. This depends on many, many influences...
You tell the engine, what you want to get and the engine decides how to get this.
This may even differ between identical calls depending on statistics, currently running queries, existing cached results etc.
Just as a hint, try this:
USE master;
GO
CREATE DATABASE testDB;
GO
USE testDB;
GO
--I create a physical test table with 1.000.000 rows
CREATE TABLE testTbl(ID INT IDENTITY PRIMARY KEY, SomeValue VARCHAR(100));
WITH MioRows(Nr) AS (SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values v1 CROSS JOIN master..spt_values v2 CROSS JOIN master..spt_values v3)
INSERT INTO testTbl(SomeValue)
SELECT CONCAT('Test',Nr)
FROM MioRows;
--Now we can start to test this
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Your approach with CTEs
;with t1 as (select * from testTbl)
,t2 as (select * from t1)
,t3 as (select * from t1)
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target1
from t2
join t3 on t3.ID=t2.ID;
SELECT 'Final CTE',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Writing the intermediate result into a physical table
SELECT * INTO test1 FROM testTbl;
SELECT 'Write into test1',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target2
from test1 t2
join test1 t3 on t3.ID=t2.ID
SELECT 'Final physical table',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Same as before, but with an primary key on the intermediate table
SELECT * INTO test2 FROM testTbl;
SELECT 'Write into test2',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
ALTER TABLE test2 ADD PRIMARY KEY (ID);
SELECT 'Add PK',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target3
from test2 t2
join test2 t3 on t3.ID=t2.ID
SELECT 'Final physical tabel with PK',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
--Clean up (Careful with real data!!!)
GO
USE master;
GO
--DROP DATABASE testDB;
GO
On my system the
first takes 674ms, the
second 1.205ms (297 for writing into test1) and the
third 1.727ms (285 for writing into test2 and ~650ms for creating the index.
Although the query is performed twice, the engine can take advantage of cached results.
Conclusio
The engine is really smart... Don't try to be smarter...
If the table would cover a lot of columns and much more data per row the whole test might return something else...
If your CTEs (sub-queries) involve much more complex data with joins, views, functions and so on, the engine might get into troubles finding the best approach.
If performance matters, you can race your horses to test it out. One hint: I sometimes used a TABLE HINT quite successfully: FORCE ORDER. This will perform joins in the order specified in the query.
Here is a simple example to test the theories:
First, via temporary table which calls the matter only once.
declare #r1 table (id int, v uniqueidentifier);
insert into #r1
SELECT * FROM
(
select id=1, NewId() as 'v' union
select id=2, NewId()
) t
-- -----------
begin
;with
t1 as (
select * from #r1
),
t2 as (
select * from t1
),
t3 as (
select * from t1
)
-- ----------------
select * from t2
union all select * from t3
end
go
On the other hand, if we put the matter inside t1 instead of the temporary table, it gets called twice.
t1 as (
select id=1, NewId() as 'v' union
select id=2, NewId()
)
Hence, my conclusion is to use temporary table and not reply on cached results.
Also, ive implemented this on a large scale query that called the "matter" twice only and after moving it to temporary table the execution time went straight half!!

How to use INSERT INTO with a LEFT JOIN?

I would appreciate if someone could help out.
There are two tables that I need to combine. So far I have been doing this every time.
SELECT * INTO TABLE_3
FROM TABLE_1 LEFT JOIN TABLE_2
ON TABLE_1.[DATE] = TABLE_2.[DATE1]
However, I would like to skip the part of creating a new table and insert the columns I need directly into the existing table.
I tried doing this,
INSERT INTO [TABLE_1] (USD,EUR,RUR)
SELECT USD,EUR,RUR
FROM TABLE_1 AS T1 LEFT JOIN TABLE_2 AS T2
ON T1.[DATE] = T2.[DATE1]
but got an error saying that my column names are ambiguous
I use SQL server 2014.
Instead of giving the column name directly, please specify the alias name to say from which table the column should take. May be here both tables having same column the you are trying to select. You should specify the exact table
INSERT INTO [TABLE_1] (USD,EUR,RUR)
SELECT [T1/T2].USD,[T1/T2].EUR,[T1/T2].RUR
FROM TABLE_1 AS T1 LEFT JOIN TABLE_2 AS T2
ON T1.[DATE] = T2.[DATE1]
Either you can specify T1 or T2 as per your business logic. Please rewrite the query as mentioned here. This will solve the problem. Please try this.

SQL Server 2012: Running multiple UPDATE FROM SELECT on the same table simultaneously

Situation:
1) There is big TABLE1 (9GB data, 20GB idx space, 12M rows)
2) There are several UPDATE and UPDATE/SELECT on TABLE1 which are run one by one
3) Each UPDATE statement updates different columns
4) None of them are using previously updated column for calculation to new updated column
5) It takes a while to complete them all
Issue:
I want to run those UPDATEs at the same time, but im concerned about deadlocks. How to avoid them? SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED will help?
UPDATEs looks like:
update TABLE1 set col1 = subs.col2
from (select ID, col2 from TABLE2) subs
where TABLE1.ID = subs.ID
update TABLE1 set col10 = col2+col3+col4
update TABLE1 set col100 = col2 + subs.col4
from (
select
b.ID, a.col4
from
TABLE3 a
join TABLE1 b on TABLE1.ID2 = TABLE3.ID2
) subs
where TABLE1.ID = subs.ID
update TABLE1 set col1000 = col2+col3+col4
from TABLE1
join TABLE4 on TABLE4.date = TABLE1.date
join TABLE5 on TABLE5.ID3 = TABLE1.ID
Dirty reads with READ UNCOMMITTED might work if same columns not updated and not used in other clauses, but I'm afraid this is fragile solution.
For more consistent solution you can mix ROWLOCK/UPDLOCK/NOLOCK depends on operations. F.e.
UPDATE
TABLE1 WITH (ROWLOCK)
SET
col1 = TABLE2.col2
FROM
TABLE1 WITH (ROWLOCK, UPDLOCK)
INNER JOIN TABLE2 WITH (NOLOCK) ON (TABLE1.ID = TABLE2.ID)
If your statements updates mostly different rows, then ROWLOCK can be omitted.
In rare cases lock escalation might happens, but it can be limited by
ALTER TABLE TABLE1 SET (LOCK_ESCALATION = DISABLE)
BTW, what is the purpose of your solution? I don't think that you'll win a lot of performance and small partial updates can handle faster than large updates in parallel.
(1) Avoid sub-queries while updating. Multiple sub-queries can quickly lead to lock escalation and cause deadlock.
(2) Check out following discussion at TABLOCK vs TABLOCKX.
(3) for current blocking and locking check out the discussion at How to find out what table a page lock belongs to
Another strategy: create a temp table holding the IDs of the rows you want to insert, along with the column's new value.
CREATE TABLE #tmp (
RowID int,
NewCol1Value ...,
NewCol2Value ...,
NewCol2Value ...
)
-- Insert into the tmp table
...
UPDATE Table1
SET Col1 = ISNULL(NewCol1Value, Col1),
Col2 = ISNULL(NewCol2Value, Col2),
...
FROM Table1 INNER JOIN #tmp ON Table1.RowID = #tmp.RowID

Updating table based on Select query in stored procedure / ColdFusion

I am using ColdFusion for for a project and I have a written a query which I think can be faster using a stored procedure, but I not a T-SQL person, so I am not sure how to do it to compare.
I am running an initial query which selects a number of fields from a table based on a dynamically built cfquery. I think I know how to convert this query into the SQL Server stored procedure.
However, directly after that, I then take all of the primary key IDs from that query and run another query against a separate table that "locks" records with those IDs. The lock is a bit field (a flag) in the second table that tells the system that this record is "checked out". I have wrapped both queries in a cftransaction so that they execute as a unit.
Code Overview:
<cftransaction>
<cfquery name="selectQuery">
SELECT id, field2, field3
FROM table1
WHERE (bunch of conditions here)
</cfquery>
<cfquery name="updateQuery">
UPDATE table2
SET lockField = 1
WHERE table2.id IN (#ValueList(selectQuery.id#)
</cfquery>
</cftransaction>
I then return the selectQuery resultset to my app which uses it for outputting some data. How would I accomplish the same thing in a single SQL Server 2008 stored procedure that I could call using cfstoredproc?
Again, I am thinking that the native CF way (with cfquery) is not as efficient as a stored procedure since I have to retrieve the resultset back to CF, then call another query back to the DB. A single stored procedure does everything in the DB and then returns the original query resultset for use.
Any ideas?
You could add an OUTPUT clause to the UPDATE statement to capture the id's of the records updated and insert them into a table variable/temp table. Then JOIN back to table1 to return the result set.
DECLARE #UpdatedRecords TABLE ( ID INT )
UPDATE t2
SET t2.lockField = 1
OUTPUT Inserted.ID INTO #UpdatedRecords ( ID )
FROM table2 t2 INNER JOIN table1 t1 ON t2.id = t1.id
WHERE (bunch of conditions for table1 here)
SELECT t1.id, t1.field2, t1.field3
FROM table1 t1 INNER JOIN #UpdatedRecords u ON t1.id = u.id
Keep in mind that if table1 is in constant flux, the other values ("field2" and "field3") are not guaranteed to be what they were when the UPDATE occurred. But I think your current method is susceptible to that issue as well.
Your problem is "bunch of conditions here". Are those conditions always static? So is it ALWAYS: (FOO = #x AND BAR = #y)? Or is it conditional where sometimes FOO does not exist at all as a condition?
If FOO is not always present then you have a problem with the stored proc. T-SQL cannot do dynamic query building, in fact even allowing it would kind of negate the point of the proc, which is to compile and pre-optimize the SQL. You CAN do it of course, but you end up just having to build a SQL string inside the proc body and then executing it at the end. You're much better off using CFQuery with cfqueryparams. Actually have you considered doing this instead?
<cfquery name="updateQuery">
UPDATE table2
SET lockField = 1
WHERE table2.id IN (SELECT id
FROM table1
WHERE (bunch of conditions here))
</cfquery>
You could do your update in one query by making your first query a subquery and then using a separate statement to return your results. The whole thing could be a single stored procedure:
CREATE PROCEDURE myUpdate
#Variable [datatype], etc...
AS
BEGIN
UPDATE table2
SET lockField = 1
WHERE table2.id IN (
SELECT id
FROM table1
WHERE (bunch of conditions here)
)
SELECT id, field2, field3
FROM table1
WHERE (bunch of conditions here)
END
You'll probably have to pass some parameters in, but that's the basic structure of a stored procedure. Then you can call it from ColdFusion like so:
<cfstoredproc procedure="myUpdate">
<cfprocparam type="[CF SQL Type]" value="[CF Variable]">
etc...
<cfprocresult name="selectQuery" resultSet="1">
</cfstoredproc>
You could use those query results just like you were using them before.
No need for a SPROC.
UPDATE table2
SET table2.lockField = 1
FROM table1
WHERE table1.id = table2.id
AND table1.field2 = <cfqueryparam ....>
AND table1.field3 = <cfqueryparam ....>

Resources