I need help in inserting 2 million rows into a table. The table I am inserting into has 4 billion rows and from where I am inserting has 2 million. The insert rate is around 190 rows per minute.
DECLARE #BatchSize INT = 5000
WHILE 1 = 1
BEGIN
INSERT INTO [dbo].[a] ([a].[col1], [a].[col2], [a].[adate], [a].[importdate])
SELECT TOP(#BatchSize)
b.col1,
b.col2,
b.adate,
b.importdate
FROM
b
WHERE
NOT EXISTS (SELECT 1
FROM dbo.[a]
WHERE [a].col1 = b.col1
AND [a].col2 = b.col2
AND [a].adate = b.adate)
--AND [sent].aDate > getdate()-10)
IF ##ROWCOUNT < #BatchSize BREAK
END;
In the above query, in table a, col1 and col2 and col3 are the primary key (Non-clustered). I want to insert every record in table a from table b ...
The table a has 3 indexes, one with col1.col2 and second with col1,col2,col3 and third with col1 only ......
Can anyone offer any idea about making it faster?
I have 128 Gb RAM on SQL Server 2008 R2.
Thanks
Since you want all of the rows in B inserted in A there should be no need to use an exists. The problem becomes one of tracking the rows already transferred in prior batches. The following example generates a row number and uses it to group rows into batches. If the row number is ordered by an existing index then there should be no sort pass required on the select side.
-- Sample data.
declare #A as Table ( Col1 Int, Col2 Int );
declare #B as Table ( Col1 Int, Col2 Int );
insert into #B ( Col1, Col2 ) values
( 1, 1 ), ( 1, 2 ), ( 1, 3 ), ( 1, 4 ), ( 1, 5 ),
( 2, 1 ), ( 2, 2 ), ( 2, 3 ), ( 2, 4 ), ( 2, 5 );
-- Rows to transfer in each batch.
declare #BatchSize as Int = 5;
-- First row to transfer in the current batch.
declare #BatchMark as Int = 1;
-- Count of rows processed.
declare #RowsProcessed as Int = 1;
-- Process the batches.
while #RowsProcessed > 0
begin
insert into #A ( Col1, Col2 )
select Col1, Col2
from ( select Col1, Col2, Row_Number() over ( order by Col1, Col2 ) as RN from #B ) as PH
where #BatchMark <= RN and RN < #BatchMark + #BatchSize;
select #RowsProcessed = ##RowCount, #BatchMark += #BatchSize;
select * from #A; -- Show progress.
end;
Alternatives would include adding a flag column to the B table to mark processed rows, using an existing id in the B table to track the maximum value already processed, using an additional table to track the index values of processed rows, deleting processed rows from B, ... .
An output clause may prove useful for some of the alternatives.
Rebuilding the index with a suitable fill-factor before transferring the data may help. See here. It depends on knowledge of the index values which is not available in your question.
Related
I have a stored procedure that should insert some random rows in a table depending on the amount values
#amount1 INT --EligibilityID = 1
#amount2 INT --EligibilityID = 2
#amount3 INT --EligibilityID = 3
Maybe the obvious way is to use TOP(#amount) but there are a lot of amount values and the second select is much larger. So, I was looking for a way to do it in a single statement if possible.
INSERT INTO [dbo].[CaseInfo]
SELECT ([EligibilityID],[CaseNumber],[CaseMonth])
FROM (
SELECT TOP(#amount1) [EligibilityID],[CaseNumber],[CaseMonth]
FROM [dbo].[tempCases]
WHERE [EligibilityID] = 1
)
INSERT INTO [dbo].[CaseInfo]
SELECT ([EligibilityID],[CaseNumber],[CaseMonth])
FROM (
SELECT TOP(#amount2) [EligibilityID],[CaseNumber],[CaseMonth]
FROM [dbo].[tempCases]
WHERE [EligibilityID] = 2
)
INSERT INTO [dbo].[CaseInfo]
SELECT ([EligibilityID],[CaseNumber],[CaseMonth])
FROM (
SELECT TOP(#amount3) [EligibilityID],[CaseNumber],[CaseMonth]
FROM [dbo].[tempCases]
WHERE [EligibilityID] = 3
)
I would recommend to use row_number, partitioned by eligibilityID, and then compare it with a case statement to select the correct variable each time:
INSERT INTO [dbo].[CaseInfo]
SELECT ([EligibilityID],[CaseNumber],[CaseMonth])
FROM (
SELECT [EligibilityID],[CaseNumber],[CaseMonth]
,row_number() over (partition by EligibilityID order by CaseNumber) as rn -- you haven't mentioned an ORDER BY, you can change it here
FROM [dbo].[tempCases]
) as table1
where rn<=case
when EligibilityID=1 then #amount1
when EligibilityID=2 then #amount2
when EligibilityID=3 then #amount3
end
Say I have part of a large query, as below, that returns a resultset with multiple rows of the same key information (PolNum) with different value information (PolPremium) in a random order.
Would it be possible to select the first matching PolNum fields and sum up the PolPremium. In this case I know that there are 2 PolNumber's used so given the screenshot of the resultset (yes I know it starts at 14 for illustration purposes) and return the first values and sum the result.
First match for PolNum 000035789547
(ROW 14) PolPremium - 32.00
First match for PolNum 000035789547
(ROW 16) PolPremium - 706043.00
Total summed should be 32.00 + 706043.00 = 706072.00
Query
OUTER APPLY
(
SELECT PolNum, PolPremium
FROM PN20
WHERE PolNum IN(SELECT PolNum FROM SvcPlanPolicyView
WHERE SvcPlanPolicyView.ControlNum IN (SELECT val AS ServedCoverages FROM ufn_SplitMax(
(SELECT TOP 1 ServicedCoverages FROM SV91 WHERE SV91.AccountKey = 3113413), ';')))
ORDER BY PN20.PolEffDate DESC
}
Resultset
Suppose that pic if the final result your query produces. Then you can do something like:
DECLARE #t TABLE
(
PolNum VARCHAR(20) ,
PolPremium MONEY
)
INSERT INTO #t
VALUES ( '000035789547', 32 ),
( '000035789547', 76 ),
( '000071709897', 706043.00 ),
( '000071709897', 1706043.00 )
SELECT t.PolNum ,
SUM(PolPremium) AS PolPremium
FROM ( SELECT * ,
ROW_NUMBER() OVER ( PARTITION BY PolNum ORDER BY PolPremium ) AS rn
FROM #t
) t
WHERE rn = 1
GROUP BY GROUPING SETS(t.PolNum, ( ))
Output:
PolNum PolPremium
000035789547 32.00
000071709897 706043.00
NULL 706075.00
Just replace #t with your query. Also I assume that row with minimum of premium is the first. You could probably do filtering top row in outer apply part but it really not clear for me what is going on there without some sample data.
Hi I crate have a Split function that return rows like bellow :
declare #a nvarchar(50)= '1,2,3,4,5,6'
select Item from dbo.Split(#a,',')
Result :
Item
--------
1
2
3
4
5
6
Now I want create a table and insert into two field from my split function like below :
declare #a nvarchar(50)= '1,2,3,4,5,6'
declare #b nvarchar(50)= '10,20,30,40,50,60'
declare #tblCare table
(
id int ,
count int
)
insert into #tblCare (id,count)
values
(
(select Item from dbo.Split(#a,',')),
(select Item from dbo.Split(#b,','))
)
select * from #tblCare
and I get this
Error : Msg 512, Level 16, State 1, Line 10 Subquery returned more
than 1 value. This is not permitted when the subquery follows =, !=,
<, <= , >, >= or when the subquery is used as an expression. The
statement has been terminated.
id count
----------- -----------
(0 row(s) affected)
and its my expect result :
id count
---------------
1 10
2 20
3 30
4 40
5 50
6 60
You can do it like this:
declare #t1 table (ID bigint identity(1, 1), Item nvarchar(max))
declare #t2 table (ID bigint identity(1, 1), Item nvarchar(max))
insert into #t1
select item from dbo.Split(#a,',')
insert into #t2
select item from dbo.Split(#b,',')
insert into #tblCare (id,count)
select T1.Item, T2.Item
from #t1 as T1
inner join #t2 as T2 on T1.ID = T2.ID
Here first I'm creating tables with identity column to enumerate rows of your splitted data.
And then just joining two results using these rownumbers and inserting it.
Your dbo.Split function should return a serial no on which we can join the two splits. I am using DelimitedSplit8K by Jeff Moden which is one of the fastest splitter there is, but you can update your split function to include a serial number using ROW_NUMBER().
declare #a nvarchar(50)= '1,2,3,4,5,6'
declare #b nvarchar(50)= '10,20,30,40,50,60'
insert into #tblCare (id,count)
SELECT a.item,b.item
FROM [DelimitedSplit8K](#a,',') a
INNER JOIN [DelimitedSplit8K](#b,',') b
ON a.itemnumber = b.itemnumber
Output
1 10
2 20
3 30
4 40
5 50
6 60
Don't use a subquery, the insert syntax form:
insert into table ...
select ...
Where the number and type of the select columns matches the inserted columns.
I have assumed you want a count from all calls to split of the items returned:
insert into #tblCare (id, count)
select item, count(*) from
(select item from dbo.Split(#a,',')
union all
select item from dbo.Split(#b,',')) x
group by item
I have a SELECT that can return hundreds of rows from a table (table can be ~50000 rows). My app is interested in knowing the number of rows returned, it means something important to me, but it actually uses only the top 5 of those hundreds of rows. What I want to do is limit the SELECT query to return only 5 rows, but also tell my app how many it would have returned (the hundreds). This is the original query:
SELECT id, a, b, c FROM table WHERE a < 2
Here is what I came up with - a CTE - but I don't feel comfortable with the total row count appearing in every column. Ideally I would want a result set of the TOP 5 and a returned parameter for the total row count.
WITH Everything AS
(
SELECT id, a, b, c FROM table
),
DetermineCount AS
(
SELECT COUNT(*) AS Total FROM Everything
)
SELECT TOP (5) id, a, b, c, Total
FROM Everything
CROSS JOIN DetermineCount;
Can you think of a better way?
Is there a way in T-SQl to return the affected row count of a select top query before the top was applied? ##rowcount would return 5 but I wonder if there is a ##rowcountbeforetop sort of thing.
Thanks in advance for your help.
** Update **
This is what I'm doing now and I kind of like it over the CTE although CTEs as so elegant.
-- #count is passed in as an out param to the stored procedure
CREATE TABLE dbo.#everything (id int, a int, b int, c int);
INSERT INTO #everything
SELECT id, a, b, c FROM table WHERE a < 2;
SET #count = ##rowcount;
SELECT TOP (5) id FROM #everything;
DROP TABLE #everything;
Here's a relatively efficient way to get 5 random rows and include the total count. The random element will introduce a full sort no matter where you put it.
SELECT TOP (5) id,a,b,c,total = COUNT(*) OVER()
FROM dbo.mytable
ORDER BY NEWID();
Assuming you want the top 5 ordering by id ascending, this will do it with a single pass through your table.
; WITH Everything AS
(
SELECT id
, a
, b
, c
, ROW_NUMBER() OVER (ORDER BY id ASC) AS rn_asc
, ROW_NUMBER() OVER (ORDER BY id DESC) AS rn_desc
FROM <table>
)
SELECT id
, a
, b
, c
, rn_asc + rn_desc - 1 AS total_rows
FROM Everything
WHERE rn_asc <= 5
** Update **
This is what I'm doing now and I kind of like it over the CTE although CTEs as so elegant. Let me know what you think. Thanks!
-- #count is passed in as an out param to the stored procedure
CREATE TABLE dbo.#everything (id int, a int, b int, c int);
INSERT INTO #everything
SELECT id, a, b, c FROM table WHERE a < 2;
SET #count = ##rowcount;
SELECT TOP (5) id FROM #everything;
DROP TABLE #everything;
Technologies: SQL Server 2008
So I've tried a few options that I've found on SO, but nothing really provided me with a definitive answer.
I have a table with two columns, (Transaction ID, GroupID) where neither has unique values. For example:
TransID | GroupID
-----------------
23 | 4001
99 | 4001
63 | 4001
123 | 4001
77 | 2113
2645 | 2113
123 | 2113
99 | 2113
Originally, the groupID was just chosen at random by the user, but now we're automating it. Thing is, we're keeping the existing DB without any changes to the existing data(too much work, for too little gain)
Is there a way to query "GroupID" on table "GroupTransactions" for the next available value of GroupID > 2000?
I think from the question you're after the next available, although that may not be the same as max+1 right? - In that case:
Start with a list of integers, and look for those that aren't there in the groupid column, for example:
;WITH CTE_Numbers AS (
SELECT n = 2001
UNION ALL
SELECT n + 1 FROM CTE_Numbers WHERE n < 4000
)
SELECT top 1 n
FROM CTE_Numbers num
WHERE NOT EXISTS (SELECT 1 FROM MyTable tab WHERE num.n = tab.groupid)
ORDER BY n
Note: you need to tweak the 2001/4000 values int the CTE to allow for the range you want. I assumed the name of your table to by MyTable
select max(groupid) + 1 from GroupTransactions
The following will find the next gap above 2000:
SELECT MIN(t.GroupID)+1 AS NextID
FROM GroupTransactions t (updlock)
WHERE NOT EXISTS
(SELECT NULL FROM GroupTransactions n WHERE n.GroupID=t.GroupID+1 AND n.GroupID>2000)
AND t.GroupID>2000
There are always many ways to do everything. I resolved this problem by doing like this:
declare #i int = null
declare #t table (i int)
insert into #t values (1)
insert into #t values (2)
--insert into #t values (3)
--insert into #t values (4)
insert into #t values (5)
--insert into #t values (6)
--get the first missing number
select #i = min(RowNumber)
from (
select ROW_NUMBER() OVER(ORDER BY i) AS RowNumber, i
from (
--select distinct in case a number is in there multiple times
select distinct i
from #t
--start after 0 in case there are negative or 0 number
where i > 0
) as a
) as b
where RowNumber <> i
--if there are no missing numbers or no records, get the max record
if #i is null
begin
select #i = isnull(max(i),0) + 1 from #t
end
select #i
In my situation I have a system to generate message numbers or a file/case/reservation number sequentially from 1 every year. But in some situations a number does not get use (user was testing/practicing or whatever reason) and the number was deleted.
You can use a where clause to filter by year if all entries are in the same table, and make it dynamic (my example is hardcoded). if you archive your yearly data then not needed. The sub-query part for mID and mID2 must be identical.
The "union 0 as seq " for mID is there in case your table is empty; this is the base seed number. It can be anything ex: 3000000 or {prefix}0000. The field is an integer. If you omit " Union 0 as seq " it will not work on an empty table or when you have a table missing ID 1 it will given you the next ID ( if the first number is 4 the value returned will be 5).
This query is very quick - hint: the field must be indexed; it was tested on a table of 100,000+ rows. I found that using a domain aggregate get slower as the table increases in size.
If you remove the "top 1" you will get a list of 'next numbers' but not all the missing numbers in a sequence; ie if you have 1 2 4 7 the result will be 3 5 8.
set #newID = select top 1 mID.seq + 1 as seq from
(select a.[msg_number] as seq from [tblMSG] a --where a.[msg_date] between '2023-01-01' and '2023-12-31'
union select 0 as seq ) as mID
left outer join
(Select b.[msg_number] as seq from [tblMSG] b --where b.[msg_date] between '2023-01-01' and '2023-12-31'
) as mID2 on mID.seq + 1 = mID2.seq where mID2.seq is null order by mID.seq
-- Next: a statement to insert a row with #newID immediately in tblMSG (in a transaction block).
-- Then the row can be updated by your app.