Using while loop in sql - sql-server

I am trying to loop through my records with a where clause.
I am trying to get only top 100 rows first and then next 100 (some logic i apply with what i get in the select clause)
But, if the first 100 rows does not return a result, it does not goes into 2nd 100 set of data.
My query is :
DECLARE #BatchSize INT = 100
DECLARE #Counter INT = 0
DECLARE #TableCount INT = 0
set #TableCount = (select count(*) from Table1) //#TableCount = 10000
while #Counter < #TableCount/#BatchSize //#Counter < 100
BEGIN
SET #Counter=#Counter+1
INSERT INTO Table4
SELECT TOP(#BatchSize) * FROM Table2
WHERE NOT EXISTS (SELECT * Table3) and some condition
Here, If i dont get the data for first 100 rows, it wont go to next 100 set of data.
What should I do ?

Instead of TOP #BatchSize in the SELECT clause, try OFFSET #BatchSize * #Counter FETCH NEXT #Batchsize ROWS ONLY after the WHERE clause.
Based on comments, you may also want to look into a SELECT INTO query, as well as the nolock query hint.

Related

Insert Into statement stuck in loop

I have two tables Table1 and Table2, and I am trying to insert all data from Table1 into Table2. To do this, I have an Insert Into statement, which inserts in batches, as shown below
CREATE PROCEDURE insert_table_data #tp_int INT AS
DECLARE #rc INT
SET #rc = 1
WHILE #rc > 0
BEGIN
BEGIN TRANSACTION
INSERT INTO table_2(
col1,
col2,
time_period
)
SELECT TOP (500) col1, col2, #tp_int FROM table_1
DELETE TOP (500) FROM table_1
SET #rc = ##ROWCOUNT
COMMIT TRANSACTION;
END
I would ideally like to do this without using a Delete statement. When I do take out the delete statement, the stored procs gets stuck in a loop. I am guessing this is because it keeps picking the Top(500) from table_1, without progressing further down the records. Any ideas on how to modify the stored procs?
If you don't want to delete along the way I would consider using an OFFSET/FETCH type of query. Clearly performance will start to drop the farther you have to read into your table, but I would test it out and consider how to use an index to help out.
And if you are moving millions of rows, I would step that batch size up a little bit.
DECLARE #batchsize INT
DECLARE #start INT
DECLARE #numberofrows INT
SELECT #numberofrows = COUNT(*) from table_1
SET #batchsize = 500
SET #start = 1
WHILE #start < #numberofrows
BEGIN
BEGIN TRANSACTION
INSERT INTO table_2(col1,col2,time_period)
SELECT col1, col2, #tp_int
FROM table_1
ORDER BY time_period,col1,col2
OFFSET #start ROWS
FETCH NEXT #batchsize ROWS ONLY
SET #start += #batchsize + 1
COMMIT TRANSACTION;
END

Loop insert SQL query

I have the below query which returns 400 million rows. I want to run the query so it loops through and inserts 1 million records at a time. Please can I get the loop query.
insert into AST (DataAreaId, Name)
select f.DataAreaId, its.Name....... etc
from Transform.InventFin f
inner join Staging.INVENTSETTLEMENT its
on f.ITSRECID=its.RECID
and f.DataAreaId=its.DATAAREAID
Try like following(Assuming that DataAreaId is Unique, if not you need to include those columns in NOT EXISTS).
declare #Count int
set #Count = 1
while #Count > 0
begin
insert into AST (DataAreaId, Name)
select TOP (1000000) f.DataAreaId, its.Name....... etc
from Transform.InventFin f
inner join Staging.INVENTSETTLEMENT its
on f.ITSRECID=its.RECID
and f.DataAreaId=its.DATAAREAID
WHERE NOT EXISTS
(
SELECT 1 FROM AST A WHERE AST.DataAreaId = F.DataAreaId
)
set #Count = ##ROWCOUNT
end

How to chunk updates to SQL Server?

I want to update a table in SQL Server by setting a FLAG column to 1 for all values since the beginning of the year:
TABLE
DATE ID FLAG (more columns...)
2016/01/01 1 0 ...
2016/01/01 2 0 ...
2016/01/02 3 0 ...
2016/01/02 4 0 ...
(etc)
Problem is that this table contains hundreds of millions of records and I've been advised to chunk the updates 100,000 rows at a time to avoid blocking other processes.
I need to remember which rows I update because there are background processes which immediately flip the FLAG back to 0 once they're done processing it.
Does anyone have suggestions on how I can do this?
Each day's worth of data has over a million records, so I can't simply loop using the DATE as a counter. I am thinking of using the ID
Assuming the date column and the ID column are sequential you could do a simple loop. By this I mean that if there is a record id=1 and date=2016-1-1 then record id=2 date=2015-12-31 could not exist. If you are worried about locks/exceptions you should add a transaction in the WHILE block and commit or rollback on failure.
Change the #batchSize to whatever you feel is right after some experimentation.
DECLARE #currentId int, #maxId int, #batchSize int = 10000
SELECT #currentId = MIN(ID), #maxId = MAX(ID) FROM YOURTABLE WHERE DATE >= '2016-01-01'
WHILE #currentId < #maxId
BEGIN
UPDATE YOURTABLE SET FLAG = 1 WHERE ID BETWEEN #currentId AND (#currentId + #batchSize)
SET #currentId = #currentId + #batchSize
END
As this as the update will never flag the same record to 1 twice I do not see a need to track which records were touched unless you are going to manually stop the process partway through.
You should also ensure that the ID column has an index on it so the retrieval is fast in each update statement.
Looks like a simple question or maybe I'm missing something.
You can create a temp/permanent table to keep track of updated rows.
create tbl (Id int) -- or temp table based on your case
insert into tbl values (0)
declare #lastId int = (select Id from tbl)
;with cte as (
select top 100000
from YourMainTable
where Id > #lastId
ORDER BY Id
)
update cte
set Flag = 1
update tbl set Id = #lastId + 100000
You can do this process in a loop (except the table creation part)
create table #tmp_table
(
id int ,
row_number int
)
insert into #tmp_table
(
id,
row_number
)
--logic to load records from base table
select
bt.id,
row_number() over(partition by id order by id ) as row_number
from
dbo.bas_table bt
where
--ur logic to limit the records
declare #batch_size int = 100000;
declare #start_row_number int,#end_row_number int;
select
#start_row_number = min(row_number),
#end_row_number = max(row_number)
from
#tmp_table
while(#start_row_number < #end_row_number)
begin
update top #batch_size
bt
set
bt.flag = 1
from
dbo.base_table bt
inner join #tmp_table tt on
tt.Id = bt.Id
where
bt.row_number between #start_row_number and (#start_row_number + #batch_size)
set #start_row_number = #start_row_number + #batch_size
end

Query inserting only top 100 rows

I am using the following query.
But, it just transfers the top 1000 rows, thats it. Even though I have more rows.
If I remove the where not exists clause, I get full data. Can you let me know where am I wrong ?
DECLARE #BatchSize INT = 1000
DECLARE #Counter INT = 0
DECLARE #TableCount INT = 0
set #TableCount = (select count(*) from Table2)
while #Counter < (#TableCount/#BatchSize+1)
BEGIN
INSERT INTO Table1
SELECT * FROM Table2 MH
inner join Table3 M
on MH.Mid = M.Mid
WHERE NOT EXISTS (
SELECT * FROM Table1
where MH.otherid = M.otherid
)
order by id OFFSET (#BatchSize * #Counter)ROWS FETCH NEXT #Batchsize ROWS ONLY;
SET #Counter=#Counter+1
END
Why is it just inserting top 1000 rows ?
WHERE NOT EXISTS (
SELECT 1 FROM Table1
)
will only evaluate to true for the first batch of inserts. After that, there are records in the target table, the WHERE clause evaluates to false, so no further inserts happen.
How many rows are in Table2? Your second loop is going to produce the WHILE condition of (1 < (TableCount / 1001) if you're Table Count is more than 1001 rows, it will jump out after the first loop.
DECLARE #BatchSize INT = 1000
DECLARE #Counter INT = 0
--DECLARE #TableCount INT = 0
--set #TableCount = (select count(*) from Table2)
declare #rows int = 1
while #rows>0--#Counter < (#TableCount/#BatchSize+1)
BEGIN
INSERT INTO Table1
SELECT * FROM Table2 MH
inner join Table3 M
on MH.Mid = M.Mid
--if you want non-existing
left join table1 t on t.field = M.field
where t.field is null
-- end if you want
--WHERE NOT EXISTS ( --not exists WHAT?
-- SELECT 1 FROM Table1
--)
order by id OFFSET (#BatchSize * #Counter) ROWS FETCH NEXT #Batchsize ROWS ONLY;
--SET #Counter=#Counter+1
select #rows = ##rowcount, #counter = #counter + 1
END

Using Between with Max?

is it possible to use Between with Max, like this:
SELECT * FROM TABLE WHERE ID BETWEEN 100 AND MAX
Or a way to go to the end?
What do you mean by Max? The maximum value of the data type? The maximum value in the column?
In any case you just need
SELECT * FROM TABLE WHERE ID >= 100
below will work
SELECT * FROM tblName WHERE id BETWEEN 100 and (SELECT MAX(id) from tblName)
I can't see why you wouldn't just use a greater-than-or-equal-to condition, but if you really insist on doing it this way:
SELECT * FROM TABLE WHERE ID BETWEEN 100 AND (SELECT MAX(ID) FROM TABLE)
As pointed out you can use a nested select to get the MAX value for the end of your range
Here is a code sample to test out the theory:
create table #TempTable (id int)
declare #Counter int
set #Counter = 1
while (#Counter < 1000)
begin
insert into #TempTable (id) values (#Counter)
set #Counter = #Counter + 1
end
select * from #TempTable where id between 800 and (Select MAX(id) from #TempTable)
drop table #TempTable

Resources