SSIS Package runs for 500x longer on one server - sql-server

I have an SSIS package--two data flow tasks, 8 components each, reading from two flat files, nothing spectacular. If I run it in BIDS, it takes reliably about 60 seconds. I have a sandbox DB server with the package running in a job which also takes reliably 30-60 seconds. On my production server, the same job with the same package takes anywhere from 30 seconds to 12 hours.
With logging enabled on the package, it looks like it bogs down--initially at least--in the pre-execute phase of one or the other (or both) data flow tasks. But I can also see the data coming in--slowly, in chunks, so I think it does move on from there later. The IO subsystem gets pounded, and SSIS generates many large temp files (about 150MB worth--my input data files are only about 24MB put together) and is reading and writing vigorously from those files (thrashing?).
Of note, if I point my BIDS instance of the package at the production server, it still only takes about 60 seconds to run! So it must be something with running dtexec there, not the DB itself.
I've already tried to optimize my package, reducing input row byte size, and I made the two data flow tasks run in series rather than in parallel--to no avail.
Both DB servers are running MSSQL 2008 R2 64-bit, same patch level. Both servers are VMs on the same host, with the same resource allocation. Load on the production server should not be that much higher than on the sandbox server right now. The only difference I can see is that the production server is running Windows Server 2008, while the sandbox is on Windows Server 2008 R2.
Help!!! Any ideas to try are welcome, what could be causing this huge discrepancy?
Appendix A
Here's what my package looks like…
The control flow is extremely simple:
The data flow looks like this:
The second data flow task is exactly the same, just with a different source file and destination table.
Notes
The completion constraint in the Control Flow is only there to make the tasks run serially to try and cut down on resources needed concurrently (not that it helped solve the problem)…there is no actual dependency between the two tasks.
I'm aware of potential issues with blocking and partially-blocking transforms (can't say I understand them completely, but somewhat at least) and I know the aggregate and merge join are blocking and could cause problems. However, again, this all runs fine and quickly in every other environment except the production server…so what gives?
The reason for the Merge Join is to make the task wait for both branches of the Multicast to complete. The right branch finds the minimum datetime in the input and deletes all records in the table after that date, while the left branch carries the new input records for insertion--so if the right branch proceeds before the aggregate and deletion, the new records will get deleted (this happened). I'm unaware of a better way to manage this.
The error output from "Delete records" is always empty--this is deliberate, as I don't actually want any rows from that branch in the merge (the merge is only there to synchronize completion as explained above).
See comment below about the warning icons.

If you have logging turned on, preferably to SQL Server, add the OnPipelineRowsSent event. You can then determine where it is spending all of its time. See this post Your IO subsystem getting slammed and generating all these temp files is because you are no longer able to keep all the information in memory (due to your async transformations).
The relevant query from the linked article is the following. It looks at events in the sysdtslog90 (SQL Server 2008+ users substitute sysssislog) and performs some time analysis on them.
;
WITH PACKAGE_START AS
(
SELECT DISTINCT
Source
, ExecutionID
, Row_Number() Over (Order By StartTime) As RunNumber
FROM
dbo.sysdtslog90 AS L
WHERE
L.event = 'PackageStart'
)
, EVENTS AS
(
SELECT
SourceID
, ExecutionID
, StartTime
, EndTime
, Left(SubString(message, CharIndex(':', message, CharIndex(':', message, CharIndex(':', message, CharIndex(':', message, 56) + 1) + 1) + 1) + 2, Len(message)), CharIndex(':', SubString(message, CharIndex(':', message, CharIndex(':', message, CharIndex(':', message, CharIndex(':', message, 56) + 1) + 1) + 1) + 2, Len(message)) ) - 2) As DataFlowSource
, Cast(Right(message, CharIndex(':', Reverse(message)) - 2) As int) As RecordCount
FROM
dbo.sysdtslog90 AS L
WHERE
L.event = 'OnPipelineRowsSent'
)
, FANCY_EVENTS AS
(
SELECT
SourceID
, ExecutionID
, DataFlowSource
, Sum(RecordCount) RecordCount
, Min(StartTime) StartTime
, (
Cast(Sum(RecordCount) as real) /
Case
When DateDiff(ms, Min(StartTime), Max(EndTime)) = 0
Then 1
Else DateDiff(ms, Min(StartTime), Max(EndTime))
End
) * 1000 As RecordsPerSec
FROM
EVENTS DF_Events
GROUP BY
SourceID
, ExecutionID
, DataFlowSource
)
SELECT
'Run ' + Cast(RunNumber As varchar) As RunName
, S.Source
, DF.DataFlowSource
, DF.RecordCount
, DF.RecordsPerSec
, Min(S.StartTime) StartTime
, Max(S.EndTime) EndTime
, DateDiff(ms, Min(S.StartTime)
, Max(S.EndTime)) Duration
FROM
dbo.sysdtslog90 AS S
INNER JOIN
PACKAGE_START P
ON S.ExecutionID = P.ExecutionID
LEFT OUTER JOIN
FANCY_EVENTS DF
ON S.SourceID = DF.SourceID
AND S.ExecutionID = DF.ExecutionID
WHERE
S.message <> 'Validating'
GROUP BY
RunNumber
, S.Source
, DataFlowSource
, RecordCount
, DF.StartTime
, RecordsPerSec
, Case When S.Source = P.Source Then 1 Else 0 End
ORDER BY
RunNumber
, Case When S.Source = P.Source Then 1 Else 0 End Desc
, DF.StartTime
, Min(S.StartTime);
You were able to use this query to discern that the Merge Join component was the lagging component. Why it performs differently between the two servers, I can't say at this point.
If you have the ability to create a table in your destination system, you could modify your process to have two 2 data flows (and eliminate the costly async components).
The first data flow would take the Flat file and Derived columns and land that into a staging table.
You then have an Execute SQL Task fire off to handle the Get Min Date + Delete logic.
Then you have your second data flow querying from your staging table and snapping it right into your destination.

The steps below will help improve your SSIS performance.
Ensure connection managers are all set toDelayValidation ( = True).
Ensure that ValidateExternalMetadata is set to false
DefaultBufferMaxRows and DefaultBufferSize to correspond to the table's row sizes
Use query hints wherever possible http://technet.microsoft.com/en-us/library/ms181714.aspx

Related

Performance problems after updating statistics SQL Server 2014

I've received a database that was previously on SQL Server 2008R2 but was just put on a SQL Server 2014 instance. There were no maintenance tasks run of any kind run on the database since 2014 (e.g. Rebuilding of indexes, updating statistics, etc.).
Once we ran update statistics as part of our regularly scheduled maintenance that we do on a set schedule, the performance of some queries has taken a massive hit to the point where some select statements will seem to never finish.
The queries have some CASE...WHEN statements in them, but I wouldn't expect there to be such a performance hit. Does anybody have any thoughts on what might cause such issues?
I've tried updating the compatibility level to 120 since it was on 100 when the database first came in but, that didn't make any difference on the performance.
If you have only just moved the database, give the system some time to build up its execution plans and cache. Also, do your index maintenance and then something like this for the stats. Dont use sp_updatestats though as it just uses a sample of data not a full scan.
what results do you get for this:
SELECT
[sch].[name] + '.' + [so].[name] AS [TableName] ,
[ss].[name] AS [Statistic],
[sp].[last_updated] AS [StatsLastUpdated] ,
[sp].[rows] AS [RowsInTable] ,
[sp].[rows_sampled] AS [RowsSampled] ,
[sp].[modification_counter] AS [RowModifications],
Convert (decimal(18,2),(convert(numeric,[sp].[modification_counter]) / convert(numeric,[sp].[rows]) * 100)) as [Percent_changed]
FROM [sys].[stats] [ss]
JOIN [sys].[objects] [so] ON [ss].[object_id] = [so].[object_id]
JOIN [sys].[schemas] [sch] ON [so].[schema_id] = [sch].[schema_id]
OUTER APPLY [sys].[dm_db_stats_properties]([so].[object_id],
[ss].[stats_id]) sp
WHERE [so].[type] = 'U'
AND [sp].[modification_counter] > 0
And [sp].[last_updated] < getdate()-1
ORDER BY [Percent_changed] DESC

How to monitor an SQL Server Agent Job that runs after being triggered

I'm running SQL Server 2008 and I have 3 agent jobs set up to run one after the other because the success of the second depends on the first, etc.
Every other job I have is independent and I monitor using a script that incorporates MSDB..sysjobhistory run_status field, among others.
I want to know if there is a way to specifically find all jobs that never started for a specific day. I know that I can think of the problem the other way and say job 2 couldn't possibly run if job 1 failed; however, I'm looking for a more general purpose solution so in case I need to create other jobs that are linked similarly I won't have to hard code more logic into my nightly report.
Any suggestions are welcome!
The MSDB..sysjobservers table holds a single record for every agent job in your server, and includes the field last_run_date which makes it easy to tell which jobs haven't run in the last 24 hours. My query looks something like this
SELECT A.name AS [Name]
,##servername AS [Server]
,'Yes' AS [Critical]
FROM MSDB.dbo.sysjobs A
INNER JOIN MSDB.dbo.sysjobservers B ON A.job_id = B.job_id
WHERE A.name LIKE '2%SPRING%'
AND DATEDIFF(DAY, CONVERT(DATETIME, CONVERT(CHAR(8),B.last_run_date)), GETDATE( )) > 1)

Best way of benchmarking INSERTs - all inclusive?

If I would like to benchmark how different table definitions affect row insertion speed in SQL Server, I guess it's not sufficient to just time transaction from BEGIN to COMMIT: this only measures the time spend to append INSERTs to the (sequential) log. Right?
But the real I/O hit comes when the INSERTs are actually applied to the real table (a clustered index which might be slightly reorganized after the INSERTs). How can I measure the total time used, all inclusive? That is, the time for all the INSERTs (written to log) + the time used for updating the "real" data structures? Is it sufficient to perform a "CHECKPOINT" before stopping the timer?
Due to lack of response I will answer this myself.
As far as I can see in various documentation, I will reach all related disk activity induced by a query by issuing a CHECKPOINT. This will force-write all dirty pages to disk.
If nothing but the query to be measured is executed, the only dirty pages will be the ones touched by the query. The experiments performed seem to support this "theory".
SET STATISTICS TIME ON will give you running and CPU times in MS for each statement you run after setting it
edit:
Using the query below you can find out exactly how many pages are dirty in the buffer pool at the time of execution as well as their size in MB and configured max/min memory on server and totals.
SELECT
ISNULL((CASE WHEN ([database_id] = 32767) THEN 'Resource Database' ELSE DB_NAME (database_id) END),'Total Pages') AS [Database Name],
SUM(CASE WHEN ([is_modified] = 1) THEN 1 ELSE 0 END) AS [Dirty Page Count],
SUM(CASE WHEN ([is_modified] = 1) THEN 0 ELSE 1 END) AS [Clean Page Count],
COUNT(*) * 8.0 / 1024.0 [Size in MB], a.value_in_use [Min Server Memory],
b.value_in_use [Max Server Memory]
FROM sys.dm_os_buffer_descriptors
INNER JOIN sys.configurations a on a.configuration_id = 1543
INNER JOIN sys.configurations b on b.configuration_id = 1544
GROUP BY [database_id],a.value_in_use,b.value_in_use WITH CUBE
HAVING A.value_in_use IS NOT NULL AND B.value_in_use IS NOT NULL
ORDER BY 1;

Linked Server Query Runs But Doesn't Finish?

June 29, 2010 - I had an un-committed action from a previous delete statement. I committed the action and I got another error about conflicting primary id's. I can fix that. So morale of the story, commit your actions.
Original Question -
I'm trying to run this query:
with spd_data as (
select *
from openquery(IRPROD,'select * from budget_user.spd_data where fiscal_year = 2010')
)
insert into [IRPROD]..[BUDGET_USER].[SPD_DATA_BUD]
(REC_ID, FISCAL_YEAR, ENTITY_CODE, DIVISION_CODE, DEPTID, POSITION_NBR, EMPLID,
spd_data.NAME, JOB_CODE, PAY_GROUP_CODE, FUND_CODE, FUND_SOURCE, CLASS_CODE,
PROGRAM_CODE, FUNCTION_CODE, PROJECT_ID, ACCOUNT_CODE, SPD_ENC_AMT, SPD_EXP_AMT,
SPD_FB_ENC_AMT, SPD_FB_EXP_AMT, SPD_TUIT_ENC_AMT, SPD_TUIT_EXP_AMT,
spd_data.RUNDATE, HOME_DEPTID, BUD_ORIG_AMT, BUD_APPR_AMT)
SELECT REC_ID, FISCAL_YEAR, ENTITY_CODE, DIVISION_CODE, DEPTID, POSITION_NBR, EMPLID,
spd_data.NAME, JOB_CODE, PAY_GROUP_CODE, FUND_CODE, FUND_SOURCE, CLASS_CODE,
PROGRAM_CODE, FUNCTION_CODE, PROJECT_ID, ACCOUNT_CODE, SPD_ENC_AMT, SPD_EXP_AMT,
SPD_FB_ENC_AMT, SPD_FB_EXP_AMT, SPD_TUIT_ENC_AMT, SPD_TUIT_EXP_AMT,
spd_data.RUNDATE, HOME_DEPTID, lngOrig_amt, lngAppr_amt
from spd_data
left join Budgets.dbo.tblAllPosDep on project_id = projid
and job_code = jcc and position_nbr = psno
and emplid = empid
where OrgProjTest = 'EQUAL';
Basically I'm selecting a table from IRPROD (an oracle db), joining it with a local table, and inserting the results back on IRPROD.
The problem I'm having is that while the query runs, it never stops. I've let it run for an hour and it keeps going until I cancel it. I can see on a bandwidth monitor on the SQL Server data going in and out. Also, if I just run the select part of the query it returns the results in 4 seconds.
Any ideas why it's not finishing? I've got other queryies setup in a similar manner and do not have any problems (granted those insert from local tables and not a remote table).
You didn't included any volume metrics. But I would recommend to use a temporary table to gather the results.
Then you should try to insert the first couple of rows. If this succeeds you'll have a strong indicator that everything is fine.
Try to break down each insert task by project_id or emplid to avoid large transactions logs.
You should also think about crafting a bulk batch process.
If you run just the select without the insert, how many records are returned? Does the data look right or are there multiple records due to the join?
Are there triggers on the table you are inserting into? If you are returning many records and triggers are on the table that are designed to run row-byrow this could be slowing things down. You are also sending to another server, so the network pipeline could be what is slowing you down. Maybe it would be better to send the budget data to the Oracle server and do the insert from there rather than from the SQL Server.

inactive databases

On our company's SQL server, there are a bunch of databases that don't appear to be used. Is there a way to determine the last time someone used a particular db, connected to it, or ran a query against it?
I don't think there's a per-database statistic for last used date.
What you can do is attach SQL Server profiler to the database, with a filter on database name. You can leave this running for a few weeks and see if there's any activity.
Another option is to check Database Properties -> Reports -> Standard Reports -> Index Usage Statistics. If the last use of any index is very old, that's a good indication the database is not being used.
Alternatively, have a look at SQL Server Auditing. I haven't used it myself, but it looks like it might suit your needs.
Here is actually a query I use. It tells you what DB was used last since its last restart. It gives a date it was last used and days since it was last used. This should be what you need. It is dynamic and will work on any server because is uses system views. Run this on the master DB.
DECLARE #RESTART DATETIME
SELECT #RESTART = login_time
FROM sysprocesses
WHERE spid = 1
SELECT ##SERVERNAME AS SERVER_NAME
,DB_NAME(db.database_id) AS 'DATABASE'
, db.state_desc AS 'State'
, #RESTART AS 'SYSTEM_RESTART'
, MAX(coalesce(us.last_user_seek, us.last_user_scan, us.last_user_lookup,#RESTART)) AS 'LAST_USE'
,CASE
WHEN DATEDIFF(DAY,#RESTART, MAX(coalesce(us.last_user_seek, us.last_user_scan, us.last_user_lookup,#RESTART))) = 0
THEN 'Prior to restart ' + CONVERT(VARCHAR(5),DATEDIFF(DAY,MAX(coalesce(us.last_user_seek, us.last_user_scan, us.last_user_lookup,#RESTART)), CURRENT_TIMESTAMP)) +' days ago'
ELSE CONVERT(VARCHAR(5),DATEDIFF(DAY,MAX(coalesce(us.last_user_seek, us.last_user_scan, us.last_user_lookup,#RESTART)), CURRENT_TIMESTAMP))
END AS 'DAYS_SINCE_LAST_USE'
FROM sys.databases db
LEFT OUTER JOIN sys.dm_db_index_usage_stats us ON db.database_id = us.database_id
/* IF you want to exclude offlinbe Dbs in your output
then remove the comment '--AND STATE != 6' below.
*/
WHERE db.database_id > 4 --AND STATE != 6
GROUP BY DB_NAME(db.database_id), db.state_desc
ORDER BY DB_NAME(db.database_id)

Resources