It takes a long time to insert into a temp table - sql-server

I have the following query:
if object_id('tempdb..#tAJ88') is not null
drop table #tAJ88
create table #tAJ88 (
conv_raw_AJ88_ECO_key int,
case_id numeric(14,0),
account_key int,
account_period_key int,
aj_number varchar(25),
county_code varchar(25)
)
insert into #tAJ88(conv_raw_AJ88_ECO_key,account_key,account_period_key,aj_number,county_code)
select ac.conv_raw_AJ88_ECO_key,a.account_key, ap.account_period_key, ac.aj_number, ac.county_code
from [Conv].[dbo].[conv_raw_AJ88_ECO] ac
inner join [IT].[dbo].[entity_identifier] ei on ei.identifier_value = ac.account_number
and ei.identifier_type_key = #MITS
inner join [IT].[dbo].[account_x_entity_id] axe on axe.entity_identifier_key = ei.entity_identifier_key
inner join [IT].[dbo].[account] a on a.account_key = axe.account_key
and a.account_type_key = (select account_type_key from [IT].[dbo].[r_account_type] where code = ac.tax_type)
inner join [IT].[dbo].[account_period] ap on ap.account_key = a.account_key
and cnsd.NEXT_STEP_NAME not in ('A','B')
where (convert(datetime, substring(ac.periods,4,4) + '-' + substring(ac.periods,1,2) + '-01' ) >= ap.period_begin_dt and convert(datetime, substring(ac.periods,4,4) + '-' + substring(ac.periods,1,2) + '-01' ) <= ap.period_end_dt)
and len(rtrim(substring(ac.periods,4,4))) = 4
The query inserts the data from a select statement. The select statement itself only takes 1 second to run and only 1500 records appear in the select statement. However, when I try to insert into the temp table, I takes more than 10 minutes. I have never seen this issue before. Is this a tech issue where we don't have enough disk space or does it have to do with indexing which should not matter.

Is it possible you are having contention in tempdb? You can read about it here from Paul Randal: https://www.sqlskills.com/blogs/paul/the-accidental-dba-day-27-of-30-troubleshooting-tempdb-contention/
Have you tried doing this insert, but instead create a real table and do the insert? That would give you a clue if it was tempdb or not.

Related

Update row with values from select on condition, else insert new row

I'm need to run a calculation for month every day. If the month period, exists already, I need to update it, else I need to create a new row for the new month.
Currently, I've written
declare #period varchar(4) = '0218'
DECLARE #Timestamp date = GetDate()
IF EXISTS(select * from #output where period=#period)
/* UPDATE #output SET --- same calculation as below ---*/
ELSE
SELECT
#period AS period,
SUM(timecard.tworkdol) AS dol_local,
SUM(timecard.tworkdol/currates.cdrate) AS dol_USD,
SUM(timecard.tworkhrs) AS hrs,
#Timestamp AS timestamp
FROM dbo.timecard AS timecard
INNER JOIN dbo.timekeep ON timecard.ttk = timekeep.tkinit
INNER JOIN dbo.matter with (nolock) on timecard.tmatter = matter.mmatter
LEFT JOIN dbo.currates with (nolock) on matter.mcurrency = currates.curcode
AND currates.trtype = 'A'
AND timecard.tworkdt BETWEEN currates.cddate1
AND currates.cddate2
WHERE timekeep.tkloc IN('06','07') AND
timecard.twoper = #period
SELECT * FROM #output;
How can simply update my row with the new data from my select.
Not sure what RDBMS are you using, but in SQL Server something like this would update the #output table with the results of the SELECT that you placed in the ELSE part:
UPDATE o
SET o.dol_local = SUM(timecard.tworkdol),
SET o.dol_USD = SUM(timecard.tworkdol/currates.cdrate),
SET o.hrs = SUM(timecard.tworkhrs),
set o.timestamp = #Timestamp
FROM #output o
INNER JOIN dbo.timecard AS timecard ON o.period = timecard.twoper
INNER JOIN dbo.timekeep ON timecard.ttk = timekeep.tkinit
INNER JOIN dbo.matter with (nolock) on timecard.tmatter = matter.mmatter
LEFT JOIN dbo.currates with (nolock) on matter.mcurrency = currates.curcode
AND currates.trtype = 'A'
AND timecard.tworkdt BETWEEN currates.cddate1
AND currates.cddate2
WHERE timekeep.tkloc IN('06','07') AND
timecard.twoper = #period
Also, I think you want to do an INSERT in the ELSE part, but you are doing just a SELECT, so I guess you should fix that too
The answer to this will vary by SQL dialect, but the two main approaches are:
1. Upsert (if your DBMS supports it), for example using a MERGE statement in SQL Server.
2. Base your SQL on an IF:
IF NOT EXISTS (criteria for dupes)
INSERT INTO (logic to insert)
ELSE
UPDATE (logic to update)

TSQL/SQL Server - table function to parse/split delimited string to multiple/separate columns

So, my first post is less a question and more a statement! Sorry.
I needed to convert delimited strings stored in VarChar table columns to multiple/separate columns for the same record. (It's COTS software; so please don't bother telling me how the table is designed wrong.) After searching the internet ad nauseum for how to create a generic single line call to do that - and finding lots of how not to do that - I created my own. (The name is not real creative.)
Returns: A table with sequentially numbered/named columns starting with [Col1]. If an input value is not provided, then an empty string is returned. If less than 32 values are provided, all past the last value are returned as null. If more than 32 values are provided, they are ignored.
Prerequisites: A Number/Tally Table (luckily, our database already contained 'dbo.numbers').
Assumptions: Not more than 32 delimited values. (If you need more, change "WHERE tNumbers.Number BETWEEN 1 AND XXX", and add more prenamed columns ",[Col33]...,[ColXXX]".)
Issues: The very first column always gets populated, even if #InputString is NULL.
--======================================================================
--SMOZISEK 2017/09 CREATED
--======================================================================
CREATE FUNCTION dbo.fStringToPivotTable
(#InputString VARCHAR(8000)
,#Delimiter VARCHAR(30) = ','
)
RETURNS TABLE AS RETURN
WITH cteElements AS (
SELECT ElementNumber = ROW_NUMBER() OVER(PARTITION BY #InputString ORDER BY (SELECT 0))
,ElementValue = NodeList.NodeElement.value('.','VARCHAR(1022)')
FROM (SELECT TRY_CONVERT(XML,CONCAT('<X>',REPLACE(#InputString,#Delimiter,'</X><X>'),'</X>')) AS InputXML) AS InputTable
CROSS APPLY InputTable.InputXML.nodes('/X') AS NodeList(NodeElement)
)
SELECT PivotTable.*
FROM (
SELECT ColumnName = CONCAT('Col',tNumbers.Number)
,ColumnValue = tElements.ElementValue
FROM DBO.NUMBERS AS tNumbers --DEPENDENT ON ANY EXISTING NUMBER/TALLY TABLE!!!
LEFT JOIN cteElements AS tElements
ON tNumbers.Number = tElements.ElementNumber
WHERE tNumbers.Number BETWEEN 1 AND 32
) AS XmlSource
PIVOT (
MAX(ColumnValue)
FOR ColumnName
IN ([Col1] ,[Col2] ,[Col3] ,[Col4] ,[Col5] ,[Col6] ,[Col7] ,[Col8]
,[Col9] ,[Col10],[Col11],[Col12],[Col13],[Col14],[Col15],[Col16]
,[Col17],[Col18],[Col19],[Col20],[Col21],[Col22],[Col23],[Col24]
,[Col25],[Col26],[Col27],[Col28],[Col29],[Col30],[Col31],[Col32]
)
) AS PivotTable
;
GO
Test:
SELECT *
FROM dbo.fStringToPivotTable ('|Height|Weight||Length|Width||Color|Shade||Up|Down||Top|Bottom||Red|Blue|','|') ;
Usage:
SELECT 1 AS ID,'Title^FirstName^MiddleName^LastName^Suffix' AS Name
INTO #TempTable
UNION SELECT 2,'Mr.^Scott^A.^Mozisek^Sr.'
UNION SELECT 3,'Ms.^Jane^Q.^Doe^'
UNION SELECT 5,NULL
UNION SELECT 7,'^Betsy^^Ross^'
;
SELECT SourceTable.*
,ChildTable.Col1 AS ColTitle
,ChildTable.Col2 AS ColFirst
,ChildTable.Col3 AS ColMiddle
,ChildTable.Col4 AS ColLast
,ChildTable.Col5 AS ColSuffix
FROM #TempTable AS SourceTable
OUTER APPLY dbo.fStringToPivotTable(SourceTable.Name,'^') AS ChildTable
;
No, I have not tested any plan (I just needed it to work).
Oh, yeah: SQL Server 2012 (12.0 SP2)
Comments? Corrections? Enhancements?
Here is my TVF. Easy to expand up to the 32 (the pattern is pretty clear).
This is a straight XML without the cost of the PIVOT.
Example - Notice the OUTER APPLY --- Use CROSS APPLY to Exclude NULLs
Select A.ID
,B.*
From #TempTable A
Outer Apply [dbo].[tvf-Str-Parse-Row](A.Name,'^') B
Returns
The UDF if Interested
CREATE FUNCTION [dbo].[tvf-Str-Parse-Row] (#String varchar(max),#Delimiter varchar(10))
Returns Table
As
Return (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace((Select replace(#String,#Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A
Where #String is not null
)
--Thanks Shnugo for making this XML safe
--Select * from [dbo].[tvf-Str-Parse-Row]('Dog,Cat,House,Car',',')
--Select * from [dbo].[tvf-Str-Parse-Row]('John <test> Cappelletti',' ')

T-SQL performance issue with bulk insert in millions of data

I have created a query which is doing a bulk insert of millions of rows of data.
While running this query, I'm getting a temdb memory error.
This is the query:
INSERT INTO ods.contact_method (cmeth_cust_id, cmeth_chan_type_id, cmeth_address_id,
cmeth_identifier, cmeth_active, cmeth_review_date,
cmeth_last_validated, cmeth_updatesrc_id, cmeth_updated_date)
SELECT
custpers_cust_id, 5, ad.adet_id,
COALESCE(street3, '') + ' ' + COALESCE(street2, '') + ' '
+ COALESCE(housenumber, '') + ' ' + COALESCE(street, ''),
CASE custpers_status
WHEN 'InActive' THEN 'N'
ELSE 'Y'
END,
Dateadd(year, 2, last_update_date),
last_update_date, 1, Getdate()
FROM
ods.address_detail (nolock) ad
JOIN
ods.customer_persona (nolock) cp ON cp.custpers_cust_id = ad.adet_updated_by
JOIN
ods.tempcust_address_insert (nolock)tp ON tp.bvoc = cp.custpers_bvoc_id
WHERE
NOT EXISTS (SELECT 1
FROM ods.contact_method (nolock) cm
WHERE cm.cmeth_cust_id = cp.custpers_cust_id
AND cm.cmeth_address_id IS NOT NULL
AND ad.adet_id = cm.cmeth_address_id)
I need help optimizing this query; should I use Left join or not exists condition on millions of data for bulk insert?
you are getting memory error in temp db can be due to below 2 issues-
1) your query is have performance issue and selecting unnecessary data. - i can not comment on this without knowing table structure, index, fragmentation and size of data. however changing if exists condition to Left join surely help to improve performance -
FROM ods.address_detail (nolock) ad
JOIN ods.customer_persona (nolock) cp
ON cp.custpers_cust_id = ad.adet_updated_by
JOIN ods.tempcust_address_insert (nolock)tp
ON tp.bvoc = cp.custpers_bvoc_id
left join contact_method cm (nolock)
on cm.cmeth_cust_id = cp.custpers_cust_id
AND ad.adet_id = cm.cmeth_address_id
AND cm.cmeth_address_id IS NOT NULL -- not sure if this condtion is required
Where cm.cmeth_cust_id is null -- add all primary key columns of contact_method here
2) temp db memory error will also come if you are selecting huge amount of data as compare to temp db size -
to solve this issue you can use 'top' while inserting the data and run the same query multiple time and left join condition in your insert query will make sure that no duplicate data is inserted.
SELECT top 1000000 -- this will make sure your are selecting limited data
custpers_cust_id,
5,
ad.adet_id,
COALESCE(street3, '') + ' '
........
If this is not a one time activity that your have to write a while loop using ##rowcont value to insert the data -
while COUNT( #count>0)
begin
<your insert statement wiht select top >
set #count = ##ROWCOUNT
end

Performance issue with larger resultsets MSSQL

I currently have a stored procedure in MSSQL where I execute a SELECT-statement multiple times based on the variables I give the stored procedure. The stored procedure counts how many results are going to be returned for every filter a user can enable.
The stored procedure isn't the issue, I transformed the select statement from te stored procedure to a regular select statement which looks like:
DECLARE #contentRootId int = 900589
DECLARE #RealtorIdList varchar(2000) = ';880;884;1000;881;885;'
DECLARE #publishSoldOrRentedSinceDate int = 8
DECLARE #isForSale BIT= 1
DECLARE #isForRent BIT= 0
DECLARE #isResidential BIT= 1
--...(another 55 variables)...
--Table to be returned
DECLARE #resultTable TABLE
(
variableName varchar(100),
[value] varchar(200)
)
-- Create table based of inputvariable. Example: turns ';18;118;' to a table containing two ints 18 AND 118
DECLARE #RealtorIdTable table(RealtorId int)
INSERT INTO #RealtorIdTable SELECT * FROM dbo.Split(#RealtorIdList,';') option (maxrecursion 150)
INSERT INTO #resultTable ([value], variableName)
SELECT [Value], VariableName FROM(
Select count(*) as TotalCount,
ISNULL(SUM(CASE WHEN reps.ForRecreation = 1 THEN 1 else 0 end), 0) as ForRecreation,
ISNULL(SUM(CASE WHEN reps.IsQualifiedForSeniors = 1 THEN 1 else 0 end), 0) as IsQualifiedForSeniors,
--...(A whole bunch more SUM(CASE)...
FROM TABLE1 reps
LEFT JOIN temp t on
t.ContentRootID = #contentRootId
AND t.RealEstatePropertyID = reps.ID
WHERE
(EXISTS(select 1 from #RealtorIdTable where RealtorId = reps.RealtorID))
AND (#SelectedGroupIds IS NULL OR EXISTS(select 1 from #SelectedGroupIdtable where GroupId = t.RealEstatePropertyGroupID))
AND (ISNULL(reps.IsForSale,0) = ISNULL(#isForSale,0))
AND (ISNULL(reps.IsForRent, 0) = ISNULL(#isForRent,0))
AND (ISNULL(reps.IsResidential, 0) = ISNULL(#isResidential,0))
AND (ISNULL(reps.IsCommercial, 0) = ISNULL(#isCommercial,0))
AND (ISNULL(reps.IsInvestment, 0) = ISNULL(#isInvestment,0))
AND (ISNULL(reps.IsAgricultural, 0) = ISNULL(#isAgricultural,0))
--...(Around 50 more of these WHERE-statements)...
) as tbl
UNPIVOT (
[Value]
FOR [VariableName] IN(
[TotalCount],
[ForRecreation],
[IsQualifiedForSeniors],
--...(All the other things i selected in above query)...
)
) as d
select * from #resultTable
The combination of a Realtor- and contentID gives me a set default set of X amount of records. When I choose a Combination which gives me ~4600 records, the execution time is around 250ms. When I execute the sattement with a combination that gives me ~600 record, the execution time is about 20ms.
I would like to know why this is happening. I tried removing all SUM(CASE in the select, I tried removing almost everything from the WHERE-clause, and I tried removing the JOIN. But I keep seeing the huge difference between the resultset of 4600 and 600.
Table variables can perform worse when the number of records is large. Consider using a temporary table instead. See When should I use a table variable vs temporary table in sql server?
Also, consider replacing the UNPIVOT by alternative SQL code. Writing your own TSQL code will give you more control and even increase performance. See for example PIVOT, UNPIVOT and performance

Dynamic Pivot with varying columns

I have a POA Code dynamic pivot that pulls data from a DX temp table and inserts the data into a temp POA table.
The issue I'm having is that there is a possibility of up to 35 different columns that can be returned. Depending on the month there could be 15 columns (POA1...POA15) or there could be all 35 columns (POA1...POA35). I join this dynamic pivot temp table on another patient table. My problem is, I need to show all 35 columns even if some of the columns do not exist in the temp POA table.
--Pivot DX POA Codes
DECLARE #POANAME VARCHAR(40)
SELECT #POAName = '##tmpPOA'
DECLARE #colsPOA NVARCHAR(2000)
SELECT #colsPOA = STUFF((SELECT DISTINCT TOP 100 PERCENT
'],[' + 'POA' + CAST(Dx.RowNum AS NVARCHAR)
FROM #tmpDX DX
ORDER BY '],[' + 'POA' + CAST(Dx.RowNum AS NVARCHAR)
FOR XML PATH ('')
),1,2,'') + ']'
DECLARE #queryPOA NVARCHAR(4000)
SET #queryPOA = 'N
SELECT
EncObjID,
'+
#colsPOA
+' INTO ' + POAName + '
FROM
(SELECT
Dx.EncObjID
,''POA'' + Dx.RowNum AS RowNum
,Dx.POAMne
FROM #tmpDx Dx
) p
PIVOT
(
MIN([POAMne])
FOR RowNum IN
( ' + #colsPOA + ' )
) AS pvt'
EXECUTE(#queryPOA)
I'm receiving an Invalid Column Name in my patient query because some of the columns don't exist in ##tmpPOA. I thought about creating a temp table called #tmpDxPOA and doing an insert (Insert Into #tmpDxPOA select * from ##tmpPOA), but that doesn't work (I receive a Column Name or number of supplied values does not match error).
Any thoughts on how to create all 35 columns even if there isn't any data? I don't care if they're null, I just need to have those place holders in the main patient query and it doesn't help that the number of columns returned varies every month.
With the help of #mxix I was able to come up with the following:
DECLARE #POASQL NVARCHAR(MAX)
SET #POASQL = N'INSERT INTO #tmpPOAFinal (EncObjID,'+#colsPOA+') SELECT * FROM ##tmpPOA'
EXECUTE(#POASQL)
I put this after the EXECUTE(#queryPOA) in my main query.
In order for this to work with Dynamic SQL the rows/colums need to exists more than zero times. Whether it be for one or more patient. I would try to fan out the number of POA possibilities right off the bat and then left outer join to get the actual values back.
IF OBJECT_ID('tempdb..#tmpPOA') IS NOT NULL DROP TABLE #tmpPOA
CREATE TABLE #tmpPOA (POA varchar(10))
IF OBJECT_ID('tempdb..#tmpPatient') IS NOT NULL DROP TABLE #tmpPatient
CREATE TABLE #tmpPatient (Patient varchar(15))
INSERT INTO #tmpPatient VALUES ('ABC123'),('ABC456'),('ABC789')
DECLARE #POAFlag as INT = 0
WHILE #POAFlag <36
BEGIN
INSERT INTO #tmpPOA
VALUES('POA' +CONVERT(varchar,#POAFlag))
SET #POAFlag = #POAFlag + 1
END
SELECT * FROM #tmpPOA
CROSS JOIN #tmpPatient
This should fan out all of the possibilities of the 35DXCodes for you to get their POA flag.

Resources