SSIS, avoid failure if source file isn't available - sql-server

I have an SSIS job that is scheduled to run every 5 minutes via SQL Agent. The job imports the contents of an excel file into a SQL table. That all works great, but the files get placed there sporadically and often times when the job runs there is no file there at all. The issue is this is causing the job to fail and send a notification email that the job failed, but I only want to be notified if the job failed while processing a file, not because there was no file there in the first place. From what I have gathered I could fix this with a script task to check if the file is there before the job continues, but I haven't been able to get that to work. Can someone break down how the script task works and what sort of script I need to check if a file exists? Or if there is some better way to accomplish what I am trying to do I am open to that as well!
The errors I get when I tried the Foreach Loop approach are

This can be done easily with a Foreach Loop Container in SSIS.
Put simply, the container will check the directory you point it at and perform the tasks within the container for each file found. If no files are found the contents of the container are never executed. Your job will not fail if no files are found. It will complete reporting success.
Check out this great intro blog post for more info.
In the image attached the question, the specific errors are related to the Excel Source failing validation. When SSIS opens a package for editing or running, the first thing it does is validate all of the artifacts needed for a successful run are available and conform to the expected shape/API. Since the expected file may not be present, right click on the Excel Connection Manager and in the Properties menu, find a setting for DelayValidation and change it to True. This will ensure the connection manager only validates the resource is available if the package is actually going to use it i.e. it passes into the Foreach Loop Container. You will also need to set the same DelayValidation to True on your Data Flow Task.

You did not mention what scripting approach you're applying to search for your file. While using C# or VB.NET are typical scripting languages used in a Scripting control task of this nature, you can also use TSQL that will simply return a boolean value saved to a user variable (Sometimes systems limit the use C# and VB.NET). Then you apply that user variable in the control flow to determine whether to import (boolean = 1) or not (boolean = 0).
Take a look at the following link that shows in detail how to set up the TSQL script that checks for whether or not a file exist.
Check for file exists or not in sql server?
Take a look at the following link that shows how to apply a conditional check based on a boolean user variable. This example also shows how to apply VB.NET in a script task to determine if the file exists (as an alternative to the before mentioned TSQL approach).
http://sql-articles.com/articles/bi/file-exists-check-in-ssis/
Hope this helps.

Related

SSIS: Why does sequence container require TransactionOption Required in order to report task failure?

I have a job that creates files on the network folder from various non-database sources. Within the job, I isolate the various file creation tasks (contained in a sequence container) from the move file task (foreach enumerator) in order to prevent a spiders web of precedence constraints from the various file creation tasks:
Data flow task that contains script component using C# and LDAP to pull data from Active Directory and output it to multiple files
Script Component that downloads files from SFTP (implements WinSCPNET.dll)
Upon successful completion, the sequence container then goes to a foreach file enumerator to move the extracted files to a folder that indicates files are ready for loading - there is no problem here.
However, an intermittent problem arose in production where the AD connection was terminating before the file extract process completed, resulting in partial files (this was not observed in testing, but should have been contemplated - my bad). So, I added a foreach enumerator outside of the sequence container with a failure precedence constraint to delete these partial extract files.
During testing of this fix, I set one of the tasks within the sequence container to report failure. Initially, the sequence container reported success, thus bypassing the delete foreach enumerator. I tried setting the MaximumErrorCount from 0 to 1, but that did not result in the desired behavior change. I then changed the sequence container's TransactionOption from supported to required and this appears to have fixed the problem. Now, the job moves files that are completely extracted while deleting files that report and error on extraction.
My question is this: Is there a potential problem going this route? I am unsure as to why this solution works. The documentation online discusses the TransactionOption in the context of a connection to the database. But, in this case there is no connection to the database. I just don't want to release a patch that may have a potential bug that I am not aware of.
Regarding Transactions and Files.
Presume you write your files to disk with an NTFS or another file system supporting transactions. Then all file create and file save actions are enclosed into one transaction. Had the transaction failed due to task failure, all the files created inside the transaction will be rolled back, i.e. deleted.
So, you will have an "all or nothing" approach on files, receiving files only if all extractions worked out.
In case you store the files on non-transactional file system, like old FAT, this "all or nothing" will no longer work and you will receive partial set of files. Transaction set on Sequence will have no such effect.

VS2017 SSIS Parallel Processing error (?)

I am trying to run parallel processes to read Excel Files into an OLEDB Destination. However on runtime, SSIS doesn't show errors though it simply stops and states:
"Package Execution completed. Click here to switch to design mode, or select Stop Debugging from the Debug Menu".
No rows have been inserted with the parallel processes and I can't find the root cause of this 'completion' in the messages list. I've provided a screenshot as an example:
The MaxConcurrentExecutables is set to 5, the Run64Bit property is set to True (False didn't change anything), and the EngineThreads property is set to 1.
Could anyone help on this problem?
SSIS cannot read the same file simultaneously. Yes, you are running into a locking issue.
The solution is to use one data connection and one data flow. In the data flow, read from the file, then add a multicast, which will allow you to duplicate the data flow as many times you want. From there, merge the tasks that are occurring in both data flows into one.
The net effect is that you will have one data flow; one data source; one multicast; two data pipelines where you can do some transformations; and two data destinations.
I'm not 100% sure if this is true, but I think I know the reason why it fails.
The reason why it suddenly 'stops' executing, could be due to the fact that once SSIS reads from an Excel File to import data, it 'locks' the Excel File. The second Data Flow Task open or access the file since it's already opened by Data Flow Task. See image below.
If someone could confirm this, it would be greatly appreciated!

Log to text file - items in Progress Tab of SSIS Package

In SSIS, i need to write into a log, whatever items get displayed on the Progress tab. Is there an inbuilt feature to do this? i tried the logging option but it logs too many details and the log is pretty huge for a single package run.
i need only whatever gets logged on the Progress tab to be logged on to a text file when the package undergoes a scheduled run through Sql server agent
Here is a possible solution.
You have an option at DTEXEC at the command-line. There is a switch called /Reporting or /Rep for short.
I found it here
This link says the same.
This is your question in MSDN. Its pretty useless since they suggest you turn on logging, which you've already done. I thought I should add it anyways for completeness.

SSIS Batch Processing

Can anyone please tell me how to perform a simple batch process in SSIS?
I know how to do such a thing using T-SQL and/or .NET code but what I wish to do is to use the SSIS GUI to do this entirely. I am not sure if it is possible and all I can find on Google are over complicated solutions.
To explain a bit more - I am reading from a flat file and I want to insert its entire contents into a Sql Server table. Pretty simple huh... But I want to do it 50,000 records at a time. I expect this to be as simple as setting a property somewhere or at the most using some kind of loop tool from the toolbox.
Thanks
You are right. It is a property of the Data Flow task.
Right click on the task, click on the "Properties..." command, look for the DefaultBufferMaxRows property and set the desired value.
References #MSDN.

SSIS Package - track calling job

I'm looking for ideas on how to automatically track the job that calls the package.
We have some genric packages that are called from different jobs, each job passes in different file paths as parameters and therefore processes very different size files depending on the path.
In the package I have some custom auditing setup which basically tracks the package start time and end time, and therefore the duration of execution. I want to be able to also track the job that called the package so if the package is running long, I can determine which job called it.
Also note I would prefer this automatic using possibly some sort of system variable or such, so that human error is not an issue. I also want these auditing tasks built into all of our packages as a template, so I would prefer not to use a user variable either - as different packages may use different variables.
Just looking for some ideas - appreciate any input
We use parent and child packages instead of different jobs calling the same package. You could send the information about which parent called it to the child package and then in the child package records that data to a table along with the start date and end date.
Our solution has a whole meta database that records all the details through logging of each step. The parent tells the child which configuration to use and log details against that configuration. The jobs call the parent package - never the child package (which doesn't have a configuration in the config table as it is always configured through variables sent in by the parent package. No human intervention necessary (except initial development or research when a failure occurs) needed.
Edit for existing jobs.
Consider that jobs can have multiple steps. Make the first step a SQL script that inserts the auditing information into a table including the start time of the package, the name of the job that called it and thename of the ssispacakge being called. Then the second step calls the SSIS package and then make the last step a SQL script that inserts the same data only with the end datetime.
A simple way to do this is to set up a variable on your SSIS package as a varchar. Set the value to the value of the variable to #[System::ParentContainerGUID] using an expression when it starts. SQL Agent won't set the value, so when run as an individual job it will be an empty string. But if called by another package it will contain the GUID of the calling package. You can test for that value. You can use a precedence contraint to control the program logic.
We have packages that run as a part of a big program but sometimes we need to run them individually. Each package has an email on failure task but we only want that to execute when the package is run individually. When it is part of the big run we collect the names of all packages that error and send them as one email from the master package. We don't want individual emails and a summary email going out on the same run.

Resources