I'm using an OLE DB Source in SSIS to pull data rows from a SQL Server 2012 database:
SELECT item_prod.wo_id, item_prod.oper_id, item_prod.reas_cd, item_prod.lot_no, item_prod.item_id, item_prod.user_id, item_prod.seq_no, item_prod.spare1, item_prod.shift_id, item_prod.ent_id, item_prod.good_prod, item_cons.lot_no as raw_lot_no, item_cons.item_id as rm_item_id, item_cons.qty_cons
FROM item_prod
LEFT OUTER JOIN item_cons on item_cons.wo_id=item_prod.wo_id AND item_cons.oper_id=item_prod.oper_id AND item_cons.seq_no=item_prod.seq_no AND item_prod.lot_no=item_cons.fg_lot_no
This works great, and is able to pull around 1 million rows per minute currently. A left outer join is used instead of a lookup due to much better performance when using no cache, and both tables may contain upwards of 40 million rows.
We need the query to only pull rows that haven't been pulled in a previous run. The last run row_id gets stored in a variable and put at the end of the above query:
WHERE item_prod.row_id > ?
On the first run, the parameter will be -1 (to parse everything). Performance drops between 5-10x by adding the where clause (1 million rows per 5-10 minutes). What is causing such a significant performance drop, and is there a way to optimize it?
It turns out, SSIS creates a stored procedure when executing a query with parameters. This was discovered by looking at the execution in SQL Server Profiler.
As a result, there was a performance hit, which I believe is related to parameter sniffing.
I changed the source to use a SQL Query from Variable and built my query using an expression instead, and this fixed the performance.
Edit: The following are the commands seen in SQL Server Profiler when executing the question's code with the where parameter:
exec [sys].sp_describe_undeclared_parameters N'SELECT item_prod.wo_id, item_prod.oper_id, item_prod.reas_cd, item_prod.lot_no, item_prod.item_id, item_prod.user_id, item_prod.seq_no, item_prod.spare1, item_prod.shift_id, item_prod.ent_id, item_prod.good_prod, item_cons.lot_no as raw_lot_no, item_cons.item_id as rm_item_id, item_cons.qty_cons
FROM item_prod
LEFT OUTER JOIN item_cons on item_cons.wo_id=item_prod.wo_id AND item_cons.oper_id=item_prod.oper_id AND item_cons.seq_no=item_prod.seq_no AND item_prod.lot_no=item_cons.fg_lot_no
WHERE item_prod.row_id > #P1'
declare #p1 int
set #p1=1
exec sp_prepare #p1 output,N'#P1 int',N'SELECT item_prod.wo_id, item_prod.oper_id, item_prod.reas_cd, item_prod.lot_no, item_prod.item_id, item_prod.user_id, item_prod.seq_no, item_prod.spare1, item_prod.shift_id, item_prod.ent_id, item_prod.good_prod, item_cons.lot_no as raw_lot_no, item_cons.item_id as rm_item_id, item_cons.qty_cons
FROM item_prod
LEFT OUTER JOIN item_cons on item_cons.wo_id=item_prod.wo_id AND item_cons.oper_id=item_prod.oper_id AND item_cons.seq_no=item_prod.seq_no AND item_prod.lot_no=item_cons.fg_lot_no
WHERE item_prod.row_id > #P1',1
select #p1
exec [sys].sp_describe_first_result_set N'SELECT item_prod.wo_id, item_prod.oper_id, item_prod.reas_cd, item_prod.lot_no, item_prod.item_id, item_prod.user_id, item_prod.seq_no, item_prod.spare1, item_prod.shift_id, item_prod.ent_id, item_prod.good_prod, item_cons.lot_no as raw_lot_no, item_cons.item_id as rm_item_id, item_cons.qty_cons
FROM item_prod
LEFT OUTER JOIN item_cons on item_cons.wo_id=item_prod.wo_id AND item_cons.oper_id=item_prod.oper_id AND item_cons.seq_no=item_prod.seq_no AND item_prod.lot_no=item_cons.fg_lot_no
WHERE item_prod.row_id > #P1',N'#P1 int',1
Since I'm not entirely sure what the above generated code does, there may be other related commands that I missed. Originally, I assumed SSIS variables would be inserted into the query, but the introduction of the #P1 parameter led me to look at stored procedure implications instead.
Related
This is for SQL Server 2012. I have been asked to optimize a stored procedure that is used for calculating billing amount for various invoices.
This is a huge stored procedure with a lot of insert, update and delete queries with joins on multiple tables.
The stored procedure is getting stuck on this particular update query for 8-9 hours. The query is updating around 400000 records.
At the time of stored procedure execution, there is no other connection with the database except the one running the stored procedure.
The query is
UPDATE D
SET Amount=CASE WHEN D.Container='PCS' AND DP.AmountPerStop =0 THEN DP1.Amount*ISNULL(ISNULL(M2.MPDFactor, M.MPDFactor), 1) ELSE DP1.Amount+DP.Amount*(DeliveryQty-1) END,
InvoiceID=I.InvoiceID,
MPDFactor=CASE WHEN D.Container='PCS' AND DP.AmountPerStop =0 THEN ISNULL(M2.MPDFactor, M.MPDFactor) ELSE NULL END,
PickupAmountCap = DP.PickupAmountCap
FROM dbo.tDeliveries D
JOIN tDeliveryPrices DP ON DP.ProductGroupCode=D.ProductGroupCode AND DP.Container=D.Container AND DP.Zone=D.Zone
JOIN tVendorAgreements A ON A.DIP=D.DIP AND DP.VendorAgreementID=A.VendorAgreementID
JOIN tDeliveryPrices DP1 ON DP1.ProductGroupCode=D.ProductGroupCode AND DP1.Container=D.Container AND DP1.Zone=D.Zone AND DP1.VendorAgreementID=A.VendorAgreementID
JOIN tDIP DIP ON D.Dip=DIP.Dip
JOIN tInvoices I ON A.VendorAgreementID=I.VendorAgreementID
JOIN #tDailyInvoicePeriodForDIP IP ON IP.DIP = DIP.DIP
LEFT JOIN tMPDFactors M ON M.DIPArea=DIP.DipArea AND M.ProductGroupCode=D.ProductGroupCode AND D.DeliveryQty BETWEEN M.StartQty AND M.EndQty
LEFT JOIN tMPDFactors M2 ON M2.DIP=DIP.Dip AND M2.ProductGroupCode=D.ProductGroupCode AND D.DeliveryQty BETWEEN M2.StartQty AND M2.EndQty
WHERE D.InvoiceID IS NULL AND
I.InvoicePeriod= #Period AND
I.InvoiceLockedDate IS NULL AND
DP1.StartQty=1 AND
(DeliveryQty BETWEEN DP.StartQty AND DP.EndQty OR D.Container='PCS') AND
D.Event_Type = 'I' AND
#Period BETWEEN A.ValidFromDate AND A.ValidUntilDate
As far as the end result is concerned, the query is working fine. It is just taking so much time. Any help will be very much appreciated.
Have you tried using the recompile option for the stored procedure? If your stored procedure generated a query plan using a very small dataset, your plan may force the procedure to walk very large tables because the original query plan determined that walking the table was the most efficient method to either gather or update data. This may not complete resolve your issue but it is another option. Also, the suggestion is a great place to start to determine inefficient steps.
I'm trying to diagnose a performance issue on a SQL Server 2014 database using the following query, which I would credit to its original author if I could remember the source:
SELECT TOP 50
st.text,
qp.query_plan,
qs.*
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY total_worker_time DESC
GO
I'm very concerned because I see a query in the results with text = CREATE PROCEDURE [dbo].[MyProcedure]... with an execution count in the "many thousand times" range. Some of our code has a database upgrade script that contains the query, but to the best of my ability to test, it should not have run more than once (and even if a bug was causing the script to run repeatedly, there should be many other similar statements with the same execution count, which isn't happening.)
Is there a logical reason why this query might be getting run repeatedly? Would EXEC MyProcedure show up as CREATE PROCEDURE in this query due to the fact that the query plan is being reused? Is there a possibility that a failure when creating a procedure would cause SQL Server to retry it continually? Can databases become haunted by malevolent ghosts?
Any troubleshooting advice is appreciated!
I need to join a Teradata table with about 0.5 billion records and a local table with about 10,000 records. I have it working in MS Access and it takes about 15 minutes to run. I would prefer to do it in SQL Server but can't even get a join with 1 record in the local SQL table to work.
Why is MS Access able to do this, albeit slowly, whereas SQL Server chokes? What is MS Access doing differently from SQL Server?
The SQL Server query with a join that fails:
SELECT a.trk, a.wgt
FROM openquery(TERADATA, 'SELECT trk, wgt
FROM SHIPMENT_DB.pkg') a
INNER JOIN (Local_Tbl) b ON a.trk = b.Tracking_Number
A simple SQL Server query without a join that works:
SELECT *
FROM openquery(TERADATA,'SELECT trk, wgt
FROM SHIPMENT_DB.pkg
WHERE trk = ''773423067500''')
Not the answer, but I had a similar issue using OPENDATASOURCE. Performance was terrible, the query took hours to run.
The solution was to ensure all colmns involved in the WHERE clause had mathcing datatypes. In my case the remote column was INT but in the query it was being passed as a varchar: ...'WHERE remote_table.ID = ''4'''...
Once I changed all values to the appropriate datatypes the query took seconds to run.
Look at the Execution Plan in SQL Server. Since it knows very little about the dataset that is going to come back from Teradata, it is making some has assumptions.
Swapping the order of the tables in the join will help. Using an explicit INNER HASH JOIN may help (once you've switched the order).
So I have a stored procedure in SQL Server. I've simplified its code (for this question) to just this:
CREATE PROCEDURE dbo.DimensionLookup as
BEGIN
select DimensionID, DimensionField from DimensionTable
inner join Reference on Reference.ID = DimensionTable.ReferenceID
END
In SSIS on SQL Server 2012, I have a Lookup component with the following source command:
EXECUTE dbo.DimensionLookup WITH RESULT SETS (
(DimensionID int, DimensionField nvarchar(700) )
)
When I run this procedure in Preview mode in BIDS, it returns the two columns correctly. When I run the package in BIDS, it runs correctly.
But when I deploy it out to the SSIS catalog (the same server the database is on), point it to the same data sources, etc. - it fails with the message:
EXECUTE statement failed because its WITH RESULT SETS clause specified 2 column(s) for result set number 1, but the statement sent
3 column(s) at run time.
Steps Tried So Far:
Adding a third column to the result set - I get a different error, VS_NEEDSNEWMETADATA - which makes sense, kind of proof there's no third column.
SQL Profiler - I see this:
exec sp_prepare #p1 output,NULL,N'EXECUTE dbo.DimensionLookup WITH RESULT SETS ((
DimensionID int, DimensionField nvarchar(700)))',1
SET FMTONLY ON exec sp_execute 1 SET FMTONLY OFF
So it's trying to use FMTONLY to get the result set data ... needless to say, running SET FMTONLY ON and then running the command in SSMS myself yields .. just the two columns.
SET NOTCOUNT ON - Nothing changed.
So, two other interesting things:
I deployed it out to my local SQL 2012 install and it worked fine, same connections, etc. So it may be a server / database configuration. Not sure what if anything it is, I didn't install the dev server and my own install was pretty much click through vanilla.
Perhaps the most interesting thing. If I remove the join from the procedure's statement so it just becomes
select DimensionID, DimensionField from DimensionTable
It goes back to just sending 2 columns in the result set! So adding a join, without adding any additional output columns, ups the result set to 3 columns. Even if I add 6 more joins, just 3 columns. So one guess is its some sort of metadata column that only gets activated when there's a join.
Anyway, as you can imagine, it's driving me kind of mad. I have a workaround to load the data into a temp table and just return that, but why won't this work? What extra column is being sent back? Why only when I add a join?
Gah!
So all credit to billinkc: The reason is because of a patch.
In Version 11.0.2100.60, SSIS Lookup SQL command metadata is gathered using the old SET FMTONLY method. Unfortunately, this doesn't work in 2012, as the Books Online entry on SET FMTONLY helpfully notes:
Do not use this feature. This feature has been replaced by sp_describe_first_result_set.
Too bad they didn't follow their own advice!
This has been patched as of version 11.0.2218.0. Metadata is correctly gathered using the sp_describe_first_result_set system stored procedure.
This can happen if the specified WITH results set in SSIS identifies that there are more columns than being returned by the stored proc being called. Check your stored proc and ensure that you have the correct number of output columns as the WITH results set.
I have a LINQ to SQL query that generates the following SQL :
exec sp_executesql N'SELECT COUNT(*) AS [value]
FROM [dbo].[SessionVisit] AS [t0]
WHERE ([t0].[VisitedStore] = #p0) AND (NOT ([t0].[Bot] = 1)) AND
([t0].[SessionDate] > #p1)',N'#p0 int,#p1 datetime',
#p0=1,#p1='2010-02-15 01:24:00'
(This is the actual SQL taken from SQL Profiler on SQL Server 2008.)
The query plan generated when I run this SQL from within Query Analyser is perfect.
It uses an index containing VisitedStore, Bot, SessionDate.
The query returns instantly.
However when I run this from C# (with LINQ) a different query plan is used that is so inefficient it doesn't even return in 60 seconds. This query plan is trying to do a key lookup on the clustered primary key which contains a couple million rows. It has no chance of returning.
What I just can't understand though is that the EXACT same SQL is being run - either from within LINQ or from within Query Analyser yet the query plan is different.
I've ran the two queries many many times and they're now running in isolation from any other queries. The date is DateTime.Now.AddDays(-7), but I've even hardcoded that date to eliminate caching problems.
Is there anything i can change in LINQ to SQL to affect the query plan or try to debug this further? I'm very very confused!
This is a relatively common problem that surprised me too when I first saw it. The first thing to do is ensure your statistics are up to date. You can check the age of statistics with:
SELECT
object_name = Object_Name(ind.object_id),
IndexName = ind.name,
StatisticsDate = STATS_DATE(ind.object_id, ind.index_id)
FROM SYS.INDEXES ind
order by STATS_DATE(ind.object_id, ind.index_id) desc
Statistics should be updated in a weekly maintenance plan. For a quick fix, issue the following command to update all statistics in your database:
exec sp_updatestats
Apart from the statistics, another thing you can check is the SET options. They can be different between Query Analyzer and your Linq2Sql application.
Another possibility is that SQL Server is using an old cached plan for your Linq2Sql query. Plans can be cached on a per-user basis, so if you run Query Analyser as a different user, that can explain different plans. Normally you could add Option (RECOMPILE) to the application query, but I guess that's hard with Linq2Sql. You can clear the entire cache with DBCC FREEPROCCACHE and see if that speeds up the Linq2Sql query.
switched to a stored procedure and the same SQL works fine. would really like to know what's going on but can't spend any more time on this now. fortunately in this instance the query was not too dynamic.
hopefully this at least helps anyone in the same boat as me