VS2017 SSIS Parallel Processing error (?) - sql-server

I am trying to run parallel processes to read Excel Files into an OLEDB Destination. However on runtime, SSIS doesn't show errors though it simply stops and states:
"Package Execution completed. Click here to switch to design mode, or select Stop Debugging from the Debug Menu".
No rows have been inserted with the parallel processes and I can't find the root cause of this 'completion' in the messages list. I've provided a screenshot as an example:
The MaxConcurrentExecutables is set to 5, the Run64Bit property is set to True (False didn't change anything), and the EngineThreads property is set to 1.
Could anyone help on this problem?

SSIS cannot read the same file simultaneously. Yes, you are running into a locking issue.
The solution is to use one data connection and one data flow. In the data flow, read from the file, then add a multicast, which will allow you to duplicate the data flow as many times you want. From there, merge the tasks that are occurring in both data flows into one.
The net effect is that you will have one data flow; one data source; one multicast; two data pipelines where you can do some transformations; and two data destinations.

I'm not 100% sure if this is true, but I think I know the reason why it fails.
The reason why it suddenly 'stops' executing, could be due to the fact that once SSIS reads from an Excel File to import data, it 'locks' the Excel File. The second Data Flow Task open or access the file since it's already opened by Data Flow Task. See image below.
If someone could confirm this, it would be greatly appreciated!

Related

SSIS, avoid failure if source file isn't available

I have an SSIS job that is scheduled to run every 5 minutes via SQL Agent. The job imports the contents of an excel file into a SQL table. That all works great, but the files get placed there sporadically and often times when the job runs there is no file there at all. The issue is this is causing the job to fail and send a notification email that the job failed, but I only want to be notified if the job failed while processing a file, not because there was no file there in the first place. From what I have gathered I could fix this with a script task to check if the file is there before the job continues, but I haven't been able to get that to work. Can someone break down how the script task works and what sort of script I need to check if a file exists? Or if there is some better way to accomplish what I am trying to do I am open to that as well!
The errors I get when I tried the Foreach Loop approach are
This can be done easily with a Foreach Loop Container in SSIS.
Put simply, the container will check the directory you point it at and perform the tasks within the container for each file found. If no files are found the contents of the container are never executed. Your job will not fail if no files are found. It will complete reporting success.
Check out this great intro blog post for more info.
In the image attached the question, the specific errors are related to the Excel Source failing validation. When SSIS opens a package for editing or running, the first thing it does is validate all of the artifacts needed for a successful run are available and conform to the expected shape/API. Since the expected file may not be present, right click on the Excel Connection Manager and in the Properties menu, find a setting for DelayValidation and change it to True. This will ensure the connection manager only validates the resource is available if the package is actually going to use it i.e. it passes into the Foreach Loop Container. You will also need to set the same DelayValidation to True on your Data Flow Task.
You did not mention what scripting approach you're applying to search for your file. While using C# or VB.NET are typical scripting languages used in a Scripting control task of this nature, you can also use TSQL that will simply return a boolean value saved to a user variable (Sometimes systems limit the use C# and VB.NET). Then you apply that user variable in the control flow to determine whether to import (boolean = 1) or not (boolean = 0).
Take a look at the following link that shows in detail how to set up the TSQL script that checks for whether or not a file exist.
Check for file exists or not in sql server?
Take a look at the following link that shows how to apply a conditional check based on a boolean user variable. This example also shows how to apply VB.NET in a script task to determine if the file exists (as an alternative to the before mentioned TSQL approach).
http://sql-articles.com/articles/bi/file-exists-check-in-ssis/
Hope this helps.

SSIS: Why does sequence container require TransactionOption Required in order to report task failure?

I have a job that creates files on the network folder from various non-database sources. Within the job, I isolate the various file creation tasks (contained in a sequence container) from the move file task (foreach enumerator) in order to prevent a spiders web of precedence constraints from the various file creation tasks:
Data flow task that contains script component using C# and LDAP to pull data from Active Directory and output it to multiple files
Script Component that downloads files from SFTP (implements WinSCPNET.dll)
Upon successful completion, the sequence container then goes to a foreach file enumerator to move the extracted files to a folder that indicates files are ready for loading - there is no problem here.
However, an intermittent problem arose in production where the AD connection was terminating before the file extract process completed, resulting in partial files (this was not observed in testing, but should have been contemplated - my bad). So, I added a foreach enumerator outside of the sequence container with a failure precedence constraint to delete these partial extract files.
During testing of this fix, I set one of the tasks within the sequence container to report failure. Initially, the sequence container reported success, thus bypassing the delete foreach enumerator. I tried setting the MaximumErrorCount from 0 to 1, but that did not result in the desired behavior change. I then changed the sequence container's TransactionOption from supported to required and this appears to have fixed the problem. Now, the job moves files that are completely extracted while deleting files that report and error on extraction.
My question is this: Is there a potential problem going this route? I am unsure as to why this solution works. The documentation online discusses the TransactionOption in the context of a connection to the database. But, in this case there is no connection to the database. I just don't want to release a patch that may have a potential bug that I am not aware of.
Regarding Transactions and Files.
Presume you write your files to disk with an NTFS or another file system supporting transactions. Then all file create and file save actions are enclosed into one transaction. Had the transaction failed due to task failure, all the files created inside the transaction will be rolled back, i.e. deleted.
So, you will have an "all or nothing" approach on files, receiving files only if all extractions worked out.
In case you store the files on non-transactional file system, like old FAT, this "all or nothing" will no longer work and you will receive partial set of files. Transaction set on Sequence will have no such effect.

SSIS - Continue further process , on processing multiple files, if any one of the file errored out

I've to load data from multiple files in to a table thorough for each loop container in SSIS. If any one of the file got error-ed out then the package stops execution.
Now, i've to move the error-ed file to a different path and continue to process the remaining files.
Any Suggestion?
Look at properties of containers and tasks. There are settings to determine how you want to handle errors. They can be ignored or stop execution of the package.
You can also look at constraints to use different paths depending on success or failure.
Lastly you can look at error handling via events.
Between those three topics you should be able to do whatever you want. There is plenty of blogs, faqs and examples available online.
When you connect arrows to different tasks you can right click and the arrow you drag and it will give you different options. One of the options on the arrows will be continue even on error.

SSIS Batch Processing

Can anyone please tell me how to perform a simple batch process in SSIS?
I know how to do such a thing using T-SQL and/or .NET code but what I wish to do is to use the SSIS GUI to do this entirely. I am not sure if it is possible and all I can find on Google are over complicated solutions.
To explain a bit more - I am reading from a flat file and I want to insert its entire contents into a Sql Server table. Pretty simple huh... But I want to do it 50,000 records at a time. I expect this to be as simple as setting a property somewhere or at the most using some kind of loop tool from the toolbox.
Thanks
You are right. It is a property of the Data Flow task.
Right click on the task, click on the "Properties..." command, look for the DefaultBufferMaxRows property and set the desired value.
References #MSDN.

How can I have my polling service call SSIS sequentially?

I have a polling service that checks a directory for new files, if there is a new file I call SSIS.
There are instances where I can't have SSIS run if another instance of SSIS is already processing another file.
How can I make SSIS run sequentially during these situations?
Note: parallel SSIS's running is fine in some circumstances, while in others not, how can I achieve both?
Note: I don't want to go into WHEN/WHY it can't run in parallel at times, but just assume sometimes it can and sometimes it can't, the main idea is how can I prevent a SSIS call IF it has to run in sequence?
If you want to control the flow sequentially, think of a design like where you can enqueue requests (for invoking SSIS) to a queue data structure. At a time, only the top request from the queue will be processed. As soon as that request completes, next request can be dequeued.

Resources