I have two tables Table1 and Table2, and I am trying to insert all data from Table1 into Table2. To do this, I have an Insert Into statement, which inserts in batches, as shown below
CREATE PROCEDURE insert_table_data #tp_int INT AS
DECLARE #rc INT
SET #rc = 1
WHILE #rc > 0
BEGIN
BEGIN TRANSACTION
INSERT INTO table_2(
col1,
col2,
time_period
)
SELECT TOP (500) col1, col2, #tp_int FROM table_1
DELETE TOP (500) FROM table_1
SET #rc = ##ROWCOUNT
COMMIT TRANSACTION;
END
I would ideally like to do this without using a Delete statement. When I do take out the delete statement, the stored procs gets stuck in a loop. I am guessing this is because it keeps picking the Top(500) from table_1, without progressing further down the records. Any ideas on how to modify the stored procs?
If you don't want to delete along the way I would consider using an OFFSET/FETCH type of query. Clearly performance will start to drop the farther you have to read into your table, but I would test it out and consider how to use an index to help out.
And if you are moving millions of rows, I would step that batch size up a little bit.
DECLARE #batchsize INT
DECLARE #start INT
DECLARE #numberofrows INT
SELECT #numberofrows = COUNT(*) from table_1
SET #batchsize = 500
SET #start = 1
WHILE #start < #numberofrows
BEGIN
BEGIN TRANSACTION
INSERT INTO table_2(col1,col2,time_period)
SELECT col1, col2, #tp_int
FROM table_1
ORDER BY time_period,col1,col2
OFFSET #start ROWS
FETCH NEXT #batchsize ROWS ONLY
SET #start += #batchsize + 1
COMMIT TRANSACTION;
END
Related
I want to update a table in SQL Server by setting a FLAG column to 1 for all values since the beginning of the year:
TABLE
DATE ID FLAG (more columns...)
2016/01/01 1 0 ...
2016/01/01 2 0 ...
2016/01/02 3 0 ...
2016/01/02 4 0 ...
(etc)
Problem is that this table contains hundreds of millions of records and I've been advised to chunk the updates 100,000 rows at a time to avoid blocking other processes.
I need to remember which rows I update because there are background processes which immediately flip the FLAG back to 0 once they're done processing it.
Does anyone have suggestions on how I can do this?
Each day's worth of data has over a million records, so I can't simply loop using the DATE as a counter. I am thinking of using the ID
Assuming the date column and the ID column are sequential you could do a simple loop. By this I mean that if there is a record id=1 and date=2016-1-1 then record id=2 date=2015-12-31 could not exist. If you are worried about locks/exceptions you should add a transaction in the WHILE block and commit or rollback on failure.
Change the #batchSize to whatever you feel is right after some experimentation.
DECLARE #currentId int, #maxId int, #batchSize int = 10000
SELECT #currentId = MIN(ID), #maxId = MAX(ID) FROM YOURTABLE WHERE DATE >= '2016-01-01'
WHILE #currentId < #maxId
BEGIN
UPDATE YOURTABLE SET FLAG = 1 WHERE ID BETWEEN #currentId AND (#currentId + #batchSize)
SET #currentId = #currentId + #batchSize
END
As this as the update will never flag the same record to 1 twice I do not see a need to track which records were touched unless you are going to manually stop the process partway through.
You should also ensure that the ID column has an index on it so the retrieval is fast in each update statement.
Looks like a simple question or maybe I'm missing something.
You can create a temp/permanent table to keep track of updated rows.
create tbl (Id int) -- or temp table based on your case
insert into tbl values (0)
declare #lastId int = (select Id from tbl)
;with cte as (
select top 100000
from YourMainTable
where Id > #lastId
ORDER BY Id
)
update cte
set Flag = 1
update tbl set Id = #lastId + 100000
You can do this process in a loop (except the table creation part)
create table #tmp_table
(
id int ,
row_number int
)
insert into #tmp_table
(
id,
row_number
)
--logic to load records from base table
select
bt.id,
row_number() over(partition by id order by id ) as row_number
from
dbo.bas_table bt
where
--ur logic to limit the records
declare #batch_size int = 100000;
declare #start_row_number int,#end_row_number int;
select
#start_row_number = min(row_number),
#end_row_number = max(row_number)
from
#tmp_table
while(#start_row_number < #end_row_number)
begin
update top #batch_size
bt
set
bt.flag = 1
from
dbo.base_table bt
inner join #tmp_table tt on
tt.Id = bt.Id
where
bt.row_number between #start_row_number and (#start_row_number + #batch_size)
set #start_row_number = #start_row_number + #batch_size
end
I have a SQL query (normal one). I have to run this query 4 times continuously (like a For Loop in programming). How can I have something like an array and repeat the query execution?
SQL Server
Update :
I am updating some data based on a column TargetLocation. This target location has values from 1 to 5. For each value I need to update the records that have same target location.
If you are running the query in SQL Server Management Studio, then you can use GO N to run a query N times. For example:
insert into MyTable (MyCol) select 'NewRow'
go 4
This will insert 4 rows into MyTable with the text 'NewRow' in them.
If you really need to loop over something in another application, then I recommend using the while loop as suggested by Peter Tirrell.
Note that loops are usually unnecessary in SQL. They may indicate code that is written with procedural logic instead of set-based logic.
Something like a simple SQL WHILE loop?
declare #counter int
set #counter = 0
while #counter < 10
begin
select 'foo'
set #counter = #counter + 1
end
I think you want a join in your UPDATE, as in:
--create two sample tables that we can work on
declare #tabletoupdate table(ID int,TARGETLOCATION int);
declare #sourcetable table(ID int,SOURCELOCATION int);
--drop in sample data
insert into #tabletoupdate select 1,10 union select 2,20 union select 3, 30;
insert into #sourcetable select 1,100 union select 2,200 union select 3, 300;
--see the 'before'
select * from #tabletoupdate
select * from #sourcetable
--make target look like source
update #tabletoupdate
set
targetlocation = s.sourcelocation
from
#tabletoupdate t
inner join #sourcetable s on s.id = t.id;
--show 'after'
select * from #tabletoupdate
select * from #sourcetable
/*
--if you really insist on doing it with a loop
--bad because its
--1) slower
--2) less readable
--3) less reliable when other users are accessing the data
declare #currentID int = 0;
declare #maxID int = (select max(id) from #sourcetable);
while #currentID < #maxID
begin
set #currentID = #currentID + 1;
declare #newval int = (select sourcelocation
from #sourcetable
where id = #currentID
);
if #newval is not null
begin
update #tabletoupdate
set TARGETLOCATION = #newval
where id = #currentID;
end
end
--*/
I've been doing some SQL Server procedures optimization lately and was looking for a testing pattern (time and result wise). I've came with this solution so far:
SET NOCOUNT ON;
----------------------------------------------------------------------------------------------------------------
-- Procedures data and performance testing pattern
----------------------------------------------------------------------------------------------------------------
-- Prepare test queries (most likely will be taken from Logs.ProcedureTraceData (DATAUK/DATAUS servers)
-- Procedures should insert records into Temporary table, so we can compare their results using EXCEPT
-- If result set columns are fixed (i.e. no Dynamic SQL is used), we can create Temporary tables inside script
-- and insert records in them to do comparison and just TRUNCATE them at the end of the loop.
-- example here: http://stackoverflow.com/a/654418/3680098
-- If there're any data discrepancies or record counts are different, it will be displayed in TraceLog table
----------------------------------------------------------------------------------------------------------------
-- Create your own TraceLog table to keep records
----------------------------------------------------------------------------------------------------------------
/*
CREATE TABLE Temporary._EB_TraceLog
(
ID INT NOT NULL IDENTITY(1, 1) CONSTRAINT PK_Temporary_EB_TraceLog_ID PRIMARY KEY
, CurrentExecutionTime INT
, TempExecutionTime INT
, CurrentExecutionResultsCount INT
, TempExecutionResultsCount INT
, IsDifferent BIT CONSTRAINT DF_Temporary_EB_TraceLog_IsDifferent DEFAULT 0 NOT NULL
, TimeDiff AS CurrentExecutionTime - TempExecutionTime
, PercentageDiff AS CAST(((CAST(CurrentExecutionTime AS DECIMAL)/ CAST(TempExecutionTime AS DECIMAL)) * 100 - 100) AS DECIMAL(10, 2))
, TextData NVARCHAR(MAX)
);
SELECT *
FROM Temporary._EB_TraceLog;
TRUNCATE TABLE Temporary._EB_TraceLog;
*/
INSERT INTO Temporary._EB_TraceLog (TextData)
SELECT TextData
FROM Temporary._EB_GetData_Timeouts
EXCEPT
SELECT TextData
FROM Temporary._EB_TraceLog;
DECLARE #Counter INT;
SELECT #Counter = MIN(ID)
FROM Temporary._EB_TraceLog
WHERE CurrentExecutionTime IS NULL
OR TempExecutionTime IS NULL
OR CurrentExecutionResultsCount IS NULL
OR TempExecutionResultsCount IS NULL;
WHILE #Counter <= (SELECT MAX(ID) FROM Temporary._EB_TraceLog)
BEGIN
DECLARE #SQLStringCurr NVARCHAR(MAX);
DECLARE #SQLStringTemp NVARCHAR(MAX);
DECLARE #StartTime DATETIME2;
SELECT #SQLStringCurr = REPLACE(TextData, 'dbo.GetData', 'Temporary._EB_GetData_Orig')
, #SQLStringTemp = REPLACE(TextData, 'dbo.GetData', 'Temporary._EB_GetData_Mod')
FROM Temporary._EB_TraceLog
WHERE ID = #Counter;
----------------------------------------------------------------------------------------------------------------
-- Drop temporary tables in script, so these numbers don't figure in SP execution time
----------------------------------------------------------------------------------------------------------------
IF OBJECT_ID(N'Temporary._EB_Test_Orig') IS NOT NULL
DROP TABLE Temporary._EB_Test_Orig;
IF OBJECT_ID(N'Temporary._EB_Test_Mod') IS NOT NULL
DROP TABLE Temporary._EB_Test_Mod;
----------------------------------------------------------------------------------------------------------------
-- Actual testing
----------------------------------------------------------------------------------------------------------------
-- Take time snapshot and execute original procedure, which inserts records into Temporary table
-- When done - measurements will be updated on TraceLog table
----------------------------------------------------------------------------------------------------------------
SELECT #StartTime = CURRENT_TIMESTAMP;
EXECUTE sp_executesql #SQLStringCurr;
UPDATE T
SET T.CurrentExecutionTime = DATEDIFF(MILLISECOND, #StartTime, CURRENT_TIMESTAMP)
FROM Temporary._EB_TraceLog AS T
WHERE T.ID = #Counter;
----------------------------------------------------------------------------------------------------------------
-- Take time snapshot and execute optimized procedure, which inserts records into Temporary table
-- When done - measurements will be updated on TraceLog table
----------------------------------------------------------------------------------------------------------------
SELECT #StartTime = CURRENT_TIMESTAMP;
EXECUTE sp_executesql #SQLStringTemp;
UPDATE T
SET T.TempExecutionTime = DATEDIFF(MILLISECOND, #StartTime, CURRENT_TIMESTAMP)
FROM Temporary._EB_TraceLog AS T
WHERE T.ID = #Counter;
----------------------------------------------------------------------------------------------------------------
-- Check if there are any data discrepancies
-- If there are any, set IsDifferent to 1, so we can find the root cause
----------------------------------------------------------------------------------------------------------------
IF EXISTS (SELECT * FROM Temporary._EB_Test_Mod EXCEPT SELECT * FROM Temporary._EB_Test_Orig)
OR EXISTS (SELECT * FROM Temporary._EB_Test_Orig EXCEPT SELECT * FROM Temporary._EB_Test_Mod)
BEGIN
UPDATE T
SET T.IsDifferent = 1
FROM Temporary._EB_TraceLog AS T
WHERE T.ID = #Counter;
END
----------------------------------------------------------------------------------------------------------------
-- Update record counts for each execution
-- We can check if there aren't any different record counts even tho results are same
-- EXCEPT clause removes duplicates when doing checks
----------------------------------------------------------------------------------------------------------------
UPDATE T
SET T.CurrentExecutionResultsCount = (SELECT COUNT(*) FROM Temporary._EB_Test_Orig)
, T.TempExecutionResultsCount = (SELECT COUNT(*) FROM Temporary._EB_Test_Mod)
FROM Temporary._EB_TraceLog AS T
WHERE T.ID = #Counter;
----------------------------------------------------------------------------------------------------------------
-- Print iteration number and proceed on next one
----------------------------------------------------------------------------------------------------------------
PRINT #Counter;
SET #Counter += 1;
END
SELECT *
FROM Temporary._EB_TraceLog;
This works quite well so far, but I would like to include IO and TIME statistics in each iteration. Is that possible?
I know I can do it using:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
But is there a way to grab summed up values and put them in my TraceLog table?
And on top of that, is there anything doesn't make sense in this piece of code?
Thanks
you can use this query
SELECT total_elapsed_time
FROM sys.dm_exec_query_stats
WHERE sql_handle in (SELECT most_recent_sql_handle
FROM sys.dm_exec_connections
CROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle)
WHERE session_id = (##spid))
i'm having a little trouble with my stored procedure.
I've written a stored procedure where i'm using a cursor.
Every thing works fine till the place where i'm inserting values from the cursor to the temp table.
here is the error
Msg 156, Level 15, State 1, Procedure ors_DailyReportMessageStatus, Line 36
Incorrect syntax near the keyword 'where'.
and here is the code
ALTER PROCEDURE [dbo].[ors_DailyReportMessageStatus]
-- Add the parameters for the stored procedure here
#startDate datetime, #endDate datetime
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
DECLARE #num int;
DECLARE #stat varchar(20);
DECLARE #statusCursor CURSOR
SET #statusCursor = CURSOR FOR
select count(ms.status) as message, ms.status from message_status ms JOIN [message] m on m.id=ms.message_id
where (m.ADDED_ON >= #startDate AND m.ADDED_ON < #endDate) GROUP BY ms.status
SET NOCOUNT ON;
if object_id('tempdb..#tempdailystatus') is not null
begin
drop table #tempdailystatus
end
CREATE TABLE #tempdailystatus(id int identity, total int , [status] varchar(20));
insert into #tempdailystatus ([status])
select distinct([status]) from message_status;
open #statusCursor
fetch next from #statusCursor into #num, #stat;
while ##FETCH_STATUS = 0
begin
-- this is where the error is
insert into #tempdailystatus (total) values (#num) where id = (select id from #tempdailystatus where [status] = #stat)
-- this were just to see whether the cursor is ok. and it is
--print #stat
--print #num ;
fetch next from #statusCursor into #num,#stat
end
close #statusCursor
deallocate #statusCursor
-- Insert statements for procedure here
-- SELECT * from #tempdailystatus
drop table #tempdailystatus
END
Am I possibly ignoring something? it seems like i'm forgetting something.
Thanks you for reading this and for giving suggestion .will appreciate it ^_^
try replacing:
insert into #tempdailystatus (total) values (#num) where id = (select id from #tempdailystatus where [status] = #stat)
with:
If Exists(Select 1 From #tempdailystatus where [status] = #stat)
Begin
insert into #tempdailystatus (total) Values(#num)
End
I made some assumptions about your logic, so you may need to adjust accordingly.
Consider the following SQL:
CREATE TABLE Foo
(
ID int IDENTITY(1,1),
Data nvarchar(max)
)
INSERT INTO Foo (Data)
SELECT TOP 1000 Data
FROM SomeOtherTable
WHERE SomeColumn = #SomeParameter
DECLARE #LastID int
SET #LastID = SCOPE_IDENTITY()
I would like to know if I can depend on the 1000 rows that I inserted into table Foo having contiguous identity values. In order words, if this SQL block produces a #LastID of 2000, can I know for certain that the ID of the first record I inserted was 1001? I am mainly curious about multiple statements inserting records into table Foo concurrently.
I know that I could add a serializable transaction around my insert statement to ensure the behavior that I want, but do I really need to? I'm worried that introducing a serializable transaction will degrade performance, but if SQL Server won't allow other statements to insert into table Foo while this statement is running, then I don't have to worry about it.
I disagree with the accepted answer. This can easily be tested and disproved by running the following.
Setup
USE tempdb
CREATE TABLE Foo
(
ID int IDENTITY(1,1),
Data nvarchar(max)
)
Connection 1
USE tempdb
SET NOCOUNT ON
WHILE NOT EXISTS(SELECT * FROM master..sysprocesses WHERE context_info = CAST('stop' AS VARBINARY(128) ))
BEGIN
INSERT INTO Foo (Data)
VALUES ('blah')
END
Connection 2
USE tempdb
SET NOCOUNT ON
SET CONTEXT_INFO 0x
DECLARE #Output TABLE(ID INT)
WHILE 1 = 1
BEGIN
/*Clear out table variable from previous loop*/
DELETE FROM #Output
/*Insert 1000 records*/
INSERT INTO Foo (Data)
OUTPUT inserted.ID INTO #Output
SELECT TOP 1000 NEWID()
FROM sys.all_columns
IF EXISTS(SELECT * FROM #Output HAVING MAX(ID) - MIN(ID) <> 999 )
BEGIN
/*Set Context Info so other connection inserting
a single record in a loop terminates itself*/
DECLARE #stop VARBINARY(128)
SET #stop = CAST('stop' AS VARBINARY(128))
SET CONTEXT_INFO #stop
/*Return results for inspection*/
SELECT ID, DENSE_RANK() OVER (ORDER BY Grp) AS ContigSection
FROM
(SELECT ID, ID - ROW_NUMBER() OVER (ORDER BY [ID]) AS Grp
FROM #Output) O
ORDER BY ID
RETURN
END
END
Yes, they will be contiguous because the INSERT is atomic: complete success or full rollback. It is also performed as a single unit of work: you wont get any "interleaving" with other processes
However (or to put your mind at rest!), consider the OUTPUT clause
DECLARE #KeyStore TABLE (ID int NOT NULL)
INSERT INTO Foo (Data)
OUTPUT INSERTED.ID INTO #KeyStore (ID) --this line
SELECT TOP 1000 Data
FROM SomeOtherTable
WHERE SomeColumn = #SomeParameter
If you want the Identity values for multiple rows use OUTPUT:
DECLARE #NewIDs table (PKColumn int)
INSERT INTO Foo (Data)
OUTPUT INSERTED.PKColumn
INTO #NewIDs
SELECT TOP 1000 Data
FROM SomeOtherTable
WHERE SomeColumn = #SomeParameter
you now have the entire set of values in the #NewIDs table. You can add any columns from the Foo table into the #NewIDs table and insert those columns as well.
It is not good practice to attach any sort of meaning whatsoever to identity values. You should assume that they are nothing more than integers guaranteed to be unique within the scope of your table.
Try adding the following:
option(maxdop 1)