T-SQL performance issue with bulk insert in millions of data

T-SQL performance issue with bulk insert in millions of data - sql-server

I have created a query which is doing a bulk insert of millions of rows of data.
While running this query, I'm getting a temdb memory error.
This is the query:
INSERT INTO ods.contact_method (cmeth_cust_id, cmeth_chan_type_id, cmeth_address_id,
cmeth_identifier, cmeth_active, cmeth_review_date,
cmeth_last_validated, cmeth_updatesrc_id, cmeth_updated_date)
SELECT
custpers_cust_id, 5, ad.adet_id,
COALESCE(street3, '') + ' ' + COALESCE(street2, '') + ' '
+ COALESCE(housenumber, '') + ' ' + COALESCE(street, ''),
CASE custpers_status
WHEN 'InActive' THEN 'N'
ELSE 'Y'
END,
Dateadd(year, 2, last_update_date),
last_update_date, 1, Getdate()
FROM
ods.address_detail (nolock) ad
JOIN
ods.customer_persona (nolock) cp ON cp.custpers_cust_id = ad.adet_updated_by
JOIN
ods.tempcust_address_insert (nolock)tp ON tp.bvoc = cp.custpers_bvoc_id
WHERE
NOT EXISTS (SELECT 1
FROM ods.contact_method (nolock) cm
WHERE cm.cmeth_cust_id = cp.custpers_cust_id
AND cm.cmeth_address_id IS NOT NULL
AND ad.adet_id = cm.cmeth_address_id)
I need help optimizing this query; should I use Left join or not exists condition on millions of data for bulk insert?

you are getting memory error in temp db can be due to below 2 issues-
1) your query is have performance issue and selecting unnecessary data. - i can not comment on this without knowing table structure, index, fragmentation and size of data. however changing if exists condition to Left join surely help to improve performance -
FROM ods.address_detail (nolock) ad
JOIN ods.customer_persona (nolock) cp
ON cp.custpers_cust_id = ad.adet_updated_by
JOIN ods.tempcust_address_insert (nolock)tp
ON tp.bvoc = cp.custpers_bvoc_id
left join contact_method cm (nolock)
on cm.cmeth_cust_id = cp.custpers_cust_id
AND ad.adet_id = cm.cmeth_address_id
AND cm.cmeth_address_id IS NOT NULL -- not sure if this condtion is required
Where cm.cmeth_cust_id is null -- add all primary key columns of contact_method here
2) temp db memory error will also come if you are selecting huge amount of data as compare to temp db size -
to solve this issue you can use 'top' while inserting the data and run the same query multiple time and left join condition in your insert query will make sure that no duplicate data is inserted.
SELECT top 1000000 -- this will make sure your are selecting limited data
custpers_cust_id,
5,
ad.adet_id,
COALESCE(street3, '') + ' '
........
If this is not a one time activity that your have to write a while loop using ##rowcont value to insert the data -
while COUNT( #count>0)
begin
<your insert statement wiht select top >
set #count = ##ROWCOUNT
end

Related

Why is a bad SQL Server execution plan cached for a non-stored procedure

Many of the QA's I've seen on this subject involve stored procedures. I'm using C# and ADO.NET to execute a SQL Server select query (not a stored procedure).
The query was using a lot of IO's and was slow, so I created a new index. Now, if I run the same query with OPTION(RECOMPILE), I get a quick response with a small number of IO's.
However, if I don't include the OPTION(RECOMPILE) on the select statement, I continue to get the slow, high IO results. I've tried to clear the execution plan cache using DBCC FREEPROCCACHE to no avail. I've also ran an UPDATE STATISTICS.
It seems like a bad plan is being recreated every time the query is run. Should I just change the code to include the OPTION(RECOMPILE), or is there something else going on?
Here is the query. I'm runing it from it from SSMS.
SET STATISTICS IO ON;
DECLARE #request_date datetime
SELECT #request_date='1/24/2017'
SELECT TOP 300
pr.[mr_entry_id],
mr.[source_code] AS accession_no,
mr.[patient_id],
pr.[rx_location],
CASE WHEN pr.[product_id] IS NULL THEN 'Compound' ELSE 'Product' END AS product_type,
mr.[description],
SUBSTRING(r.[fname],1,1) + ' ' + r.[lname] AS requested_by,
rs.[short_name] AS status, mr.[category],
CASE WHEN mr.[date_recorded] >= '12/16/2009' THEN pr.[date_requested] ELSE NULL END AS date_recorded,
CASE WHEN mr.[date_recorded] >= '12/16/2009' THEN v.[lname] + ', ' + v.[fname] ELSE '' END AS requester,
mr.[request_finalized_date],
SUBSTRING(u.[fname],1,1) + ' ' + u.[lname] AS clinician,
pr.[is_onfile],
pr.[is_outside_prescription]
FROM [rx_request] pr
INNER JOIN [mr_entry] mr ON pr.[mr_entry_id] = mr.[id]
INNER JOIN [request_status] rs ON (rs.rc_id = 25000 and pr.[status_id] = rs.[status_id])
LEFT OUTER JOIN [vmis_user] v ON pr.[requester] = v.[id]
LEFT OUTER JOIN [client] rc ON mr.[request_rc_id] = rc.[id]
LEFT OUTER JOIN [vmis_user] u ON mr.[request_clinician_id] = u.[id]
LEFT OUTER JOIN [vmis_user] r ON mr.[request_finalized_id] = r.[id]
WHERE mr.[class] = 'Medication'
AND mr.[is_request] = 1
AND mr.[finalized_date] BETWEEN #request_date AND DATEADD(d,1,#request_date)
AND rs.[rc_id] = 25000
AND mr.[is_finalized] = 1
OPTION(RECOMPILE)

issue with locking of database

We are having huge problem with our database. We are using SQL Server 2014 with 2008 compatibility .
Every morning we are getting database locking issues and database uses 100% cpu.
So what we do to fix this every morning is to modify below stored procedure and switch between adding and removing NoLock from statements and run procedure again and database is happy. So we are not sure why we sometimes need to add NoLock and sometimes don't need to add.
ALTER Procedure [dbo].[ScanBox]
(
#KollieID varchar(50) = '',
#SupplierID int = 0,
#BuyerID int = 0
)
As
Set NoCount On
IF #kollieid <> '' AND #supplierid >0
BEGIN
SELECT TRPO_KollieID.ID, OrderID, ISNULL(BoxNo, - 1) AS BoxNo, ISNULL(KollieID, '') AS KollieID, KollieNumber,
ISNULL(LastStatus, - 1) AS LastStatus, LastStatusTime, ISNULL(LastStatusPDA, - 1) AS LastStatusPDA,
ISNULL(LastStatusTDLogin, '') AS LastStatusTDLogin, ISNULL(OrginalKollieID,'') AS OrginalKollieID,
ISNULL(OS.StatusName, '') AS LastStatusText , ISNULL(OrderStatusExternalText,'') AS OrderStatusExternalText ,
ISNULL(Terminal, '') AS Terminal, ISNULL(TerminalZone,'') AS TerminalZone
FROM TRPO_KollieID
WITH (NOLOCK)
LEFT OUTER JOIN OrderStatus os WITH (NOLOCK) on OS.StatusID = TRPO_KollieID.LastStatus
WHERE TRPO_KollieID.KollieID = #KollieID AND (OrderID IN (SELECT TRPO_Header.ID
FROM TRPO_Header WITH (NOLOCK)
INNER JOIN TRPO_KollieID WITH (NOLOCK) ON TRPO_KollieID.OrderID = TRPO_Header.ID
WHERE TRPO_Header.SupplierID = #supplierid AND TRPO_Header.Status <> 'A' AND TRPO_KollieID.KollieID = #kollieid))
END
ELSE IF #kollieid <> '' AND #BuyerID >0
BEGIN
SELECT TRPO_KollieID.ID, OrderID, ISNULL(BoxNo, - 1) AS BoxNo, ISNULL(KollieID, '') AS KollieID, KollieNumber,
ISNULL(LastStatus, - 1) AS LastStatus, LastStatusTime, ISNULL(LastStatusPDA, - 1) AS LastStatusPDA,
ISNULL(LastStatusTDLogin, '') AS LastStatusTDLogin, ISNULL(OrginalKollieID,'') AS OrginalKollieID,
ISNULL(OS.StatusName, '') AS LastStatusText , ISNULL(OrderStatusExternalText,'') AS OrderStatusExternalText ,
ISNULL(Terminal, '') AS Terminal, ISNULL(TerminalZone,'') AS TerminalZone
FROM TRPO_KollieID
WITH (NOLOCK)
LEFT OUTER JOIN OrderStatus os WITH (NOLOCK) on OS.StatusID = TRPO_KollieID.LastStatus
WHERE TRPO_KollieID.KollieID = #KollieID AND OrderID IN (SELECT TRPO_Header.ID
FROM TRPO_Header WITH (NOLOCK)
INNER JOIN TRPO_KollieID WITH (NOLOCK) ON TRPO_KollieID.OrderID = TRPO_Header.ID
WHERE TRPO_Header.BuyerID = #BuyerID AND TRPO_Header.Status <> 'A' AND TRPO_KollieID.KollieID = #kollieid)
END
What we have tried is
we tried to index database.
Tried to set isolationlevel to read committed snapshot
Nothing is helping us. Does anyone has very good idea to fix this?

recompilation helps you and not nolock. Try add OPTION(RECOMPILE) OR OPTION (OPTIMIZE)... Or get query paln

If you are using Read Committed Snapshot Isolation Level, this means that readers do not block writers and writers do not block readers. So in this case nolock hint are useless and I will suggest you to remove them.
I think the base problem is in another thing but locking, maybe this stored procedures executes slow and uses many Server resources.

It takes a long time to insert into a temp table

I have the following query:
if object_id('tempdb..#tAJ88') is not null
drop table #tAJ88
create table #tAJ88 (
conv_raw_AJ88_ECO_key int,
case_id numeric(14,0),
account_key int,
account_period_key int,
aj_number varchar(25),
county_code varchar(25)
)
insert into #tAJ88(conv_raw_AJ88_ECO_key,account_key,account_period_key,aj_number,county_code)
select ac.conv_raw_AJ88_ECO_key,a.account_key, ap.account_period_key, ac.aj_number, ac.county_code
from [Conv].[dbo].[conv_raw_AJ88_ECO] ac
inner join [IT].[dbo].[entity_identifier] ei on ei.identifier_value = ac.account_number
and ei.identifier_type_key = #MITS
inner join [IT].[dbo].[account_x_entity_id] axe on axe.entity_identifier_key = ei.entity_identifier_key
inner join [IT].[dbo].[account] a on a.account_key = axe.account_key
and a.account_type_key = (select account_type_key from [IT].[dbo].[r_account_type] where code = ac.tax_type)
inner join [IT].[dbo].[account_period] ap on ap.account_key = a.account_key
and cnsd.NEXT_STEP_NAME not in ('A','B')
where (convert(datetime, substring(ac.periods,4,4) + '-' + substring(ac.periods,1,2) + '-01' ) >= ap.period_begin_dt and convert(datetime, substring(ac.periods,4,4) + '-' + substring(ac.periods,1,2) + '-01' ) <= ap.period_end_dt)
and len(rtrim(substring(ac.periods,4,4))) = 4
The query inserts the data from a select statement. The select statement itself only takes 1 second to run and only 1500 records appear in the select statement. However, when I try to insert into the temp table, I takes more than 10 minutes. I have never seen this issue before. Is this a tech issue where we don't have enough disk space or does it have to do with indexing which should not matter.

Is it possible you are having contention in tempdb? You can read about it here from Paul Randal: https://www.sqlskills.com/blogs/paul/the-accidental-dba-day-27-of-30-troubleshooting-tempdb-contention/
Have you tried doing this insert, but instead create a real table and do the insert? That would give you a clue if it was tempdb or not.

sqlserver Query time taking

I am executing below query. It takes 80 seconds for just 17 records.
can any body tell me reason if knows. I have already tried with Indexes.
SELECT DISTINCT t.i_UserID,
u.vch_LoginName,
t.vch_PreviousEmailAddress AS 'vch_EmailAddress',
u.vch_DisplayName,
t.d_TransactionDate AS 'd_DateAdded',
'Old' AS 'vch_RecordStatus'
FROM tblEmailTransaction t
INNER JOIN tblUser u
ON t.i_UserID = u.i_UserID
WHERE t.vch_PreviousEmailAddress LIKE '%kala%'

Change collation for vch_PreviousEmailAddress column on Latin1_General_100_BIN2
Create covered index:
CREATE NONCLUSTERED INDEX ix
ON dbo.tblEmailTransaction (vch_PreviousEmailAddress)
INCLUDE (i_UserID, d_TransactionDate)
GO
And have fun with this query:
SELECT t.i_UserID,
u.vch_LoginName,
t.vch_PreviousEmailAddress AS vch_EmailAddress,
u.vch_DisplayName,
t.d_TransactionDate AS d_DateAdded,
'Old' AS vch_RecordStatus
FROM (
SELECT DISTINCT i_UserID,
vch_PreviousEmailAddress,
d_TransactionDate
FROM dbo.tblEmailTransaction
WHERE vch_PreviousEmailAddress LIKE '%kala%' COLLATE Latin1_General_100_BIN2
) t
JOIN dbo.tblUser u ON t.i_UserID = u.i_UserID

One other thing, which I find useful in solving problems like this:
Try running the following script. It will tell you which indexes you could ask to your SQL Server database, which would make the most (positive) improvement.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT TOP 100
ROUND(s.avg_total_user_cost * s.avg_user_impact * (s.user_seeks + s.user_scans),0) AS 'Total Cost',
s.avg_user_impact,
d.statement AS 'Table name',
d.equality_columns,
d.inequality_columns,
d.included_columns,
'CREATE INDEX [IndexName] ON ' + d.statement + ' ( '
+ case when (d.equality_columns IS NULL OR d.inequality_columns IS NULL)
then ISNULL(d.equality_columns, '') + ISNULL(d.inequality_columns, '')
else ISNULL(d.equality_columns, '') + ', ' + ISNULL(d.inequality_columns, '')
end + ' ) '
+ CASE WHEN d.included_columns IS NULL THEN '' ELSE 'INCLUDE ( ' + d.included_columns + ' )' end AS 'CREATE INDEX command'
FROM sys.dm_db_missing_index_groups g,
sys.dm_db_missing_index_group_stats s,
sys.dm_db_missing_index_details d
WHERE d.database_id = DB_ID()
AND s.group_handle = g.index_group_handle
AND d.index_handle = g.index_handle
ORDER BY [Total Cost] DESC
The right-hand column displays the CREATE INDEX command which you'd need to run, to create that index.
This one of those lifesaver scripts, which I run on our in-house databases once ever so often.
But yes, in your example, this is just likely to tell you that you need an index on the vch_PreviousEmailAddress field in your tblEmailTransaction table.

The probable bottleneck are 2:
Missing Index on tblEmailTransaction.i_UserID: Check if the table has the index
Missing Index on tblUser.i_UserID: Check if the table has the index
Like Statement: Like statement is know to be not good in performance, as Devart suggested, try to specify collection in this way:
WHERE vch_PreviousEmailAddress LIKE '%kala%' COLLATE Latin1_General_100_BIN2
To have a better view on your query, You have to run this command with your query:
SET IO STATISTICS ON
It will write all the IO Access that the query does and the we can see what happen.
Just a final question ?
How many rows contains the two tables?
Ciao

Forcing SQL Server to pre-cache entire database into memory

We have a client site with a 50Gb SQL 2012 database on a server with 100+ Gb of RAM.
As the application is used, SQL server does a great job of caching the db into memory but the performance increase from the caching occurs the SECOND time a query is run, not the first.
To try to maximize cache hits the first time queries are run, we wrote a proc that iterates through every index of every table within the entire DB, running this:
SELECT * INTO #Cache
FROM ' + #tablename + ' WITH (INDEX (' + #indexname + '))'
In an attempt to force a big, ugly, contrived read for as much data as possible.
We have it scheduled to run every 15 minutes, and it does a great job in general.
Without debating other bottlenecks, hardware specs, query plans, or query optimization, does anybody have any better ideas about how to accomplish this same task?
UPDATE
Thanks for the suggestions. Removed the "INTO #Cache". Tested & it didn't make a difference on filling the buffer.
Added: Instead of Select *, I'm selecting ONLY the keys from the Index. This (obviously) is more to-the-point and is much faster.
Added: Read & Cache Constraint Indexes also.
Here's the current code: (hope it's useful for somebody else)
CREATE VIEW _IndexView
as
-- Easy way to access sysobject and sysindex data
SELECT
so.name as tablename,
si.name as indexname,
CASE si.indid WHEN 1 THEN 1 ELSE 0 END as isClustered,
CASE WHEN (si.status & 2)<>0 then 1 else 0 end as isUnique,
dbo._GetIndexKeys(so.name, si.indid) as Keys,
CONVERT(bit,CASE WHEN EXISTS (SELECT * FROM sysconstraints sc WHERE object_name(sc.constid) = si.name) THEN 1 ELSE 0 END) as IsConstraintIndex
FROM sysobjects so
INNER JOIN sysindexes si ON so.id = si.id
WHERE (so.xtype = 'U')--User Table
AND ((si.status & 64) = 0) --Not statistics index
AND ( (si.indid = 0) AND (so.name <> si.name) --not a default clustered index
OR
(si.indid > 0)
)
AND si.indid <> 255 --is not a system index placeholder
UNION
SELECT
so.name as tablename,
si.name as indexname,
CASE si.indid WHEN 1 THEN 1 ELSE 0 END as isClustered,
CASE WHEN (si.status & 2)<>0 then 1 else 0 end as isUnique,
dbo._GetIndexKeys(so.name, si.indid) as Keys,
CONVERT(bit,0) as IsConstraintIndex
FROM sysobjects so
INNER JOIN sysindexes si ON so.id = si.id
WHERE (so.xtype = 'V')--View
AND ((si.status & 64) = 0) --Not statistics index
GO
CREATE PROCEDURE _CacheTableToSQLMemory
#tablename varchar(100)
AS
BEGIN
DECLARE #indexname varchar(100)
DECLARE #xtype varchar(10)
DECLARE #SQL varchar(MAX)
DECLARE #keys varchar(1000)
DECLARE #cur CURSOR
SET #cur = CURSOR FOR
SELECT v.IndexName, so.xtype, v.keys
FROM _IndexView v
INNER JOIN sysobjects so ON so.name = v.tablename
WHERE tablename = #tablename
PRINT 'Caching Table ' + #Tablename
OPEN #cur
FETCH NEXT FROM #cur INTO #indexname, #xtype, #keys
WHILE (##FETCH_STATUS = 0)
BEGIN
PRINT ' Index ' + #indexname
--BEGIN TRAN
IF #xtype = 'V'
SET #SQL = 'SELECT ' + #keys + ' FROM ' + #tablename + ' WITH (noexpand, INDEX (' + #indexname + '))' --
ELSE
SET #SQL = 'SELECT ' + #keys + ' FROM ' + #tablename + ' WITH (INDEX (' + #indexname + '))' --
EXEC(#SQL)
--ROLLBACK TRAN
FETCH NEXT FROM #cur INTO #indexname, #xtype, #keys
END
CLOSE #cur
DEALLOCATE #cur
END
GO

First of all, there is a setting called "Minumum Server Memory" that looks tempting. Ignore it. From MSDN:
The amount of memory acquired by the Database Engine is entirely dependent on the workload placed on the instance. A SQL Server instance that is not processing many requests may never reach min server memory.
This tells us that setting a larger minimum memory won't force or encourage any pre-caching. You may have other reasons to set this, but pre-filling the buffer pool isn't one of them.
So what can you do to pre-load data? It's easy. Just set up an agent job to do a select * from every table. You can schedule it to "Start automatically when Sql Agent Starts". In other words, what you're already doing is pretty close to the standard way to handle this.
However, I do need to suggest three changes:
Don't try to use a temporary table. Just select from the table. You don't need to do anything with the results to get Sql Server to load your buffer pool: all you need to do is the select. A temporary table could force sql server to copy the data from the buffer pool after loading... you'd end up (briefly) storing things twice.
Don't run this every 15 minutes. Just run it once at startup, and then leave it alone. Once allocated, it takes a lot to get Sql Server to release memory. It's just not needed to re-run this over and over.
Don't try to hint an index. Hints are just that: hints. Sql Server is free to ignore those hints, and it will do so for queries that have no clear use for the index. The best way to make sure the index is pre-loaded is to construct a query that obviously uses that index. One specific suggestion here is to order the results in the same order as the index. This will often help Sql Server use that index, because then it can "walk the index" to produce the results.

This is not an answer, but to supplement Joel Coehoorn's answer, you can look at the table data in the cache using this statement. Use this to determine whether all the pages are staying in the cache as you'd expect:
USE DBMaint
GO
SELECT COUNT(1) AS cached_pages_count, SUM(s.used_page_count)/COUNT(1) AS total_page_count,
name AS BaseTableName, IndexName,
IndexTypeDesc
FROM sys.dm_os_buffer_descriptors AS bd
INNER JOIN
(
SELECT s_obj.name, s_obj.index_id,
s_obj.allocation_unit_id, s_obj.OBJECT_ID,
i.name IndexName, i.type_desc IndexTypeDesc
FROM
(
SELECT OBJECT_NAME(OBJECT_ID) AS name,
index_id ,allocation_unit_id, OBJECT_ID
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.hobt_id
AND (au.type = 1 OR au.type = 3)
UNION ALL
SELECT OBJECT_NAME(OBJECT_ID) AS name,
index_id, allocation_unit_id, OBJECT_ID
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.partition_id
AND au.type = 2
) AS s_obj
LEFT JOIN sys.indexes i ON i.index_id = s_obj.index_id
AND i.OBJECT_ID = s_obj.OBJECT_ID ) AS obj
ON bd.allocation_unit_id = obj.allocation_unit_id
INNER JOIN sys.dm_db_partition_stats s ON s.index_id = obj.index_id AND s.object_id = obj.object_ID
WHERE database_id = DB_ID()
GROUP BY name, obj.index_id, IndexName, IndexTypeDesc
ORDER BY obj.name;
GO

Use this to replace function dbo._GetIndexKeys
(SELECT STRING_AGG(COL_NAME(ic.object_id,ic.column_id), ',') FROM sys.index_columns ic WHERE ic.object_id = so.id AND ic.index_id = si.indid) AS Keys,
--dbo._GetIndexKeys(so.name, si.indid) as Keys,

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight