SSIS Package Hangs Randomly on Execution - sql-server

I'm working with an SSIS package that itself calls multiple SSIS packages and hangs periodically during execution.
This is a once-a-day package that runs every evening and collects new and changed records from our census databases and migrates them into the staging tables of our data warehouse. Each dimension has its own package that we call through this package.
So, the package looks like
Get current change version
Load last change version
Identify changed values
a-z - Move changed records to staging tables (Separate packages)
Save change version for future use
All of those are execute SQL tasks except for the moving records tasks which are twenty some execute package tasks (data move tasks), which are executed somewhat in parallel. (Max four at a time.)
The strange part is that it almost always fails when executed by the SQL agent (using a proxy user) or dtexec, but never fails when I run the package through Visual Studio. I've added logging so that I can see where it stops, but it's inconsistent.
We didn't see any of this while working in our development / training environments, but the volume of data is considerably smaller. I wonder if we're just doing too much at once.
I may - to test - execute the tasks serially through the SQL Server agent to see if it's a problem with a package calling a package , but I'd rather not do this because we have a relatively short time in the evening to do this for seven database servers.
I'm slightly new to SSIS, so any advice would be appreciated.
Justin

Related

SQL Server job hangs when calling an SSIS package until agent is restarted

I have googled and read many questions/answers, but only one question has ever sounded exactly the same and it did not have an answer.
The situation:
My group has several SQL Servers that are running SQL Server 2017. They are configured virtually identically.
These servers are build boxes, meaning they pull data from a data ware house, or an extract file, run some ETL processing and then push to a prod box. SSIS packages are deployed on the box where the DB resides.
Just over a month ago (with no updates having occurred), one of these servers started having an issue where all the jobs that ran an SSIS package would "hang" on the step that ran the package. Any other step runs fine. But a job step that runs a package (all jobs do this), will not even start the package. The package shows no indication in the executions that anything has even tried to start it.
If the user executes the deployed package it will run successfully.
The only thing that will "fix" the issue is restarting the agent service.
I created a simple job to run a simple package every 5 mins. It had been running for about a week, the last time it ran was 4/11/2021 at 2:40am, the 2:45 run hung. I could find nothing in the event logs that occurred at that time. The server was rebooted as a normal scheduled process at 3:15 and was online by 3:25 because that is the next time it tried to run and it again just hung. So even a server reboot did not fix the issue.
I am at my wits end, since there is no error (the job hangs and the package does not even start) there is no logging that I can find that is showing any issues, I am at a loss as to what might cause this.
Thanks in advance.
Take a look at the SSISDB catalog database on each/all the servers involved. Has it grown exponentially and needs the history etc. cleared down or settings changed? How big are the transaction logs for those databases etc.?

Run multiple SSIS projects on the same server in parallel

I have about 7 projects deployed on a SQL Server. Each one contains a MasterPackage which run all the child packages of that project. The issue is that I want all 7 projects to run in parallel, starting at the same time, but as it is right now, they get queued up and start one after another. Can I make all the projects start at the same time?
You can always schedule packages' executions by the means of SQL Server Agent jobs. You will probably have to create a separate job for each project, but after that, whatever schedule you pick for them should be followed.
Just keep in mind that, if packages push a lot of data through, server might not cope with the total workload, so parallel execution might be slower than a serialised one.

Is this circumstance, is it better to use an SSIS package, or just script out the job?

Forewarning: I wasn't entirely sure if this question should be in here (SO) or in The Workplace because it isn't so much about programming, as much as it is convincing my co-worker that I think their method is bad. But it's still programming related. So MODs, please feel free to relocate this question to 'the workplace'. Anyway...
At work we have large SSAS cubes that have been split into multiple partitions. The individual who set up these partitions scheduled every partition to be processed everyday. But in hindsight because the data in these partitions is historic there is no need to process each partition everyday. Only the current partition should be processed after the latest data has been added into the cube's data source.
I gave my coworker a task to automate this process. I figured all they need to do is get the current date, and then process the partition corresponding to that date range. Easily scriptable.
My coworker creates an SSIS package for doing this...
Cons:
the ssis package is hard to source control
the ssis package will be hard to test
the ssis package is a pain in the ass to debug
the ssis package requires Visual Studio and Visual Studio Data Tools to even open
lastly, I feel SSIS packages lead to heavy technical-debt
Pros:
it's easier for my coworker to do (maybe)
Correct me if I'm wrong on any of those but just the first reason is enough for me to scrap all of their work.
Needless to say I'm extremely biased against anything done in SSIS. But processing a cube can be scripted out in xmla (ref: link). And then using a SQL Server Agent job you can schedule that script to run a specific times. The only tricky part would be changing out the partition name that is processed within the script. Furthermore, the script/job can be kept in source control and then deployed to the MSSQL server whenever a change is made.
Am I being too critical here? I'm just trying to keep the next developers from ripping their hair out.
What you can do is to have two SQL Jobs:
1) Full processing + repartitioning
2) Incremental processing (processing of only last (current) partition).
You don't need SSIS neither for (1), nor for (2).
For (2) the script will be fixed - you just make a call to process one partition and incremental processing of dimensions (if required). Current partition must have a condition WHERE >= .... (not BETWEEN), so it covers the future dates if a new partition is not created yet.
For (1), you can write TSQL code that creates a new partition for the new period and reprocess the cube. It can be scheduled to run over weekend when the server is idle, or once per month.
(1) does below:
backup existing cube (Prod) via SSAS Command of SQL Agent
restore the backup as TempCube via SSAS Command of SQL Agent with AllowOverwrite (in case if temp cube was not deleted before)
delete all partitions in TempCube via TSQL + LinkedServer to SSAS
re-create partitions and process cube (full) via TSQL +LinkedServer to SSAS
Backup TempCube
Delete TempCube
Restore backup of TempCube as Production Cube.
As you see, the process is crash safe and you don't need SSIS. Even when a job (that creates a new partition) wasn't run for some reason, the cube still have the new data. The data will be split when a new partition structure is created by (1).
I think you are looking at this wrong way. To be honest your list of cons is pretty bad and is just a reflection of your opinion of SSIS. There is always more than one tool in the toolbox for any job. The proper tool to use will vary from shop to shop.
If the skill set of the party responsible for development and maintenance of this automated process is SSIS then you should really have a better reason than personal preference to rewrite the process with a different tool. A couple reasons I can think of are company standards and skill set of the team.
If company standard dictates the tool then follow that. You should have staff that are capable of using the tools the company mandates. Otherwise assess the skill set of your team. If you have a team of SSIS developers don't force them to use something else because its your personal preference.
Your task of dynamic SSAS partitions processing can be automated with or without SSIS. SSIS is just an environment to execute tasks and do data manipulation. On its Pros - it has built-in components which execute XMLA script from variable and capture error messages. In pure .NET you have to do it yourself, but it is not too complex.
Several sample approaches to your task
Create XMLA XML and execute it with SSIS.
Generate XMLA from AMO library and execute it in .NET. You need to look at chapter 4d) Process All Dims. Provided sample does more that that and steps are put into SSIS package as well.
I personally used SSIS in similar situation, probably because the other 99% of ETL logic and data manipulation is on SSIS. As said before, SSIS offers no significant advantage here. The second example shows how to do it in pure .NET.

SSIS package is showing successful but not performing action in any task

I have one job where after taking backup of data mart in 2nd step a ssis pkg is calling (performing delta load) but the weird situation is that this package is showing successful from last two months everyday but without loading/trucating/updating data in the package.
It just touching all the task without doing anything and make this step successful within some 1-3 minute(normally this step of job takes around 2-3 hours).
I have checked all the permissions and access to sa and all are fine.
The local configuration tables are also fine.
Could you please suggest what could be the reason for this and where I need to investigate the issue. (It's a production server.)

Using temporary tables in SSIS flow fails

I have an ETL process which extracts ~40 tables from a source database (Oracle 10g) to a SQL Server (2014 developer edition) Staging environment. My process for extraction:
Determine newest row in staging
Select all newer rows from source
Insert results into #TEMPTABLE
Merge results from #TEMPTABLE to Staging
This works on a package by package basis both from Visual Studio locally and executing from SSISDB on the SQL Server.
However I am grouping my Extract jobs into one master package for ease of execution and flow to the transform stage. Only approximately 5 of my packages use temporary tables, the others are all trunc and load, but wanted to move some more to this method. When i run the master package anything using a temporary table fails. Because of pretty large log files, its hard to pinpoint the actual error but so far all it tells me is that the #TEMPTABLE can't be found and/or the status is VS_ISBROKEN.
Things i have tried:
Set all relevant components to delay validation = false
Master package has ExecuteOutOfProcess = true
Increased my tempdb capacity far exceeding my needs
A thought i had was the RetainSameConnection = true on my Staging database connection - could this be the cause? I would try to create separate connections for each, but assumed the ExecuteOutOfProcess would take care of this for me.
EDIT
I created the following scenario:
Package A (Master package containing Execute Package Task references only)
Package B (Uses temp tables)
Package C (No temp tables)
Executing Package B on it's own completes successfully. All temp table usage is contained within this package - there is no requirement for Package C to see the temp table created by Package B.
Executing Package C completes successfully.
Executing Package A, C completes successfully, B fails.
UPDATE
The workaround was to create a package level connection for each package that uses temporary tables, thus ensuring that each package held its own connection. I have raised a connect issue with Microsoft as i believe that as the parent package opens the connection it should inherit and retain throughout any child packages.
Several suggestions to your case.
Set RetainSameCoonection=true. This will allow you to work safely with TEMP tables in SSIS packages.
Would not use ExecuteOutOfProcess, it will increase your RAM footprint since every Child pack will start in its process, and decrease performance - add process start lag. This used in 32-bit environments to overcome 2 GB limit, but on x64 it is no longer necessary.
Child package execution does not inherit connection object instances from its Parent, so the same connection will not be spanned across all of your Child packages.
SSIS Packages with Temp table operations are more difficult to debug (less obvious), so pay attention to testing.

Resources