Background:
I've a folder that gets pumped with files continuously. My SSIS package needs to process the files and delete them. The SSIS package is scheduled to run once every minute. I'm picking up the files in ascending order of file creation time. I'm building an array of files and then processing-deleting them one at a time.
Problem:
If an instance of my package takes longer than one minute to run, the next instance of the SSIS package will pick up some of the files the previous instance has in its buffer. By the time the second instance of teh package gets around to processing a file, it may already have been deleted by the first instance, creating an exception condition.
I was wondering whether there was a way to avoid the exception condition.
Thanks.
How are you scheduling the job? If you are using the SQL Server Job Scheduler I'm under the impression is should not re-run a job already running; see this SO question: Will a SQL Server Job skip a scheduled run if it is already running?
Alternatively, rather than trying to move the file around you could build a step of your job to test if it is already running. I've not done this myself but it would appear to be possible, have a read of this article Detecting The State of a SQL Server Agent Job
Can you check for the existance of the file before you delete it.
File.Exists(filepathandname)
To make sure your package are not messing up with the same files, you could just create an empty file called just like the filename but with another extension (like mydata.csv.being_processes) and make sure your Data Flow Task is running only on files that don't have such file.
This acts as a lock.
Of course you could change the way you're scheduling your jobs but often - when we encounter such an issue - it's because we got no leverage on those things :)
You can create a "lock file" to prevent parallel execution of packages. To protect yourself from case with crashed package consider using file's create date to emulate lock timeout.
I.e.: at the beginning of package you will check for existence of lock file. If it's non-existent OR it was created more than X hours ago - then continue with import. Otherwise exit.
I have a similar situation. What you do is have your SSIS package read all files within a folder and create a work file like 'process.txt'. This will create a list of valid files at that point in time. If you have multiple packages, then create the file with a name like 'process_.txt'. The package will only process the files named in their process file. This will prevent overlap.
Related
I created an SSIS package that extracts two files from a .zip file, imports data from them and then attempts to delete the files that were extracted.
The package works, data is imported and all tasks report success. However the File System Tasks that attempt to delete the files don't delete them, even though they report success.
I use the CozyRoc Zip Task to extract the files. When I remove the Zip Task, the File System Tasks actually do delete the files. I'm not certain that CozyRoc is causing the problem, but the existence of that task may be causing other issues with the package.
Can anyone help me figure out how to reliably delete the files?
Do I need to put in some sort of pause after the Data Flow Tasks to allow them to release whatever locks they might have on the files?
Is there a way to view the DOS commands that the File System tasks use at run time, to verify that they are actually attempting to delete the correct files?
Thank You,
Robbie
Control Flow:
Details:
Visual Studio 2019 v16.11.3
File Names are from Flat File Connection Managers (See image below).
Flat File Connection Managers use Expressions to set their connection strings.
The same connection managers are used to import the data, so I presume that they accurately refer to the correct files and their correct locations.
File System Task Editor for one of the delete tasks:
I have a strange situation. I've caused strange situations before, but now it's happening to me. I have a .txt file ( log.txt) being created on a server drive and I don't know where it's coming from.
The contents of the .txt file say the actual date and the actual time an application process ran, but this the format of what's in the .txt file:
(date) (time) AM Starting Job: (date) (time) AM
I've checked a number of things to try to see what's causing this. I have identified a SQL Server Agent Job that runs at that specific time. It runs a SSIS package. Part of that package runs a PowerShell script that starts 16 processes of an application.
The txt file is defiantly showing data of when the PowerShell script is executing the 16 or so Start-Process's in that script.
The agent job doesn't have any steps to create such a file
The SSIS package doesn't have logging turned on. (Right-click the screen in Visual Studio > logging.)
There are no tasks on the project to create the .txt file in the SSIS project.
An application is running a process on part of this and I think that's what's creating it, but the developer doesn't think it's the app creating it.
Is there anything else I should check to see what's generating this?
I found the answer by using SysInternals > ProcMon. I scheduled the job at a time where I could start ProcMon and monitor the creation of the file. In this case, it was the tool I thought it was. The developer fixed the issue.
In case anyone else would like to learn more about SysInternals, here are a few links:
Info on SysInternals:
https://learn.microsoft.com/en-us/sysinternals/
ProcMon:
https://learn.microsoft.com/en-us/sysinternals/downloads/procmon
Video on how to find issues from the creator of SysInternals:
https://channel9.msdn.com/Events/Ignite/2016/BRK4028
I need to import a flat file daily. The file changes its name every day. After the file is processed, it needs to be moved to another folder.
I noticed I can schedule jobs in the SQL Server Agent, and that I can tell it to run every hour or so and that I am able to add CMD commands to it.
The solution I found was to run a script to check if the file exists, since the folder should be empty or have at least one file.
If the file exists, the script renames the file to one used in the SSIS package and then it runs the SSIS package.
After the whole thing is done, it should rename the file again based on today's date and move it to another folder.
If the file does not exist, then it should do nothing and wait another hour or so to run again.
What's the best solution to this scenario? Is the script a good idea? Maybe is it possible to add the if/else -for the file exists- into the SSIS package? Or even make the script run from the SSIS package itself instead of adding it to the Server Agent?
EDIT:
It seems I was a little naïve, it's possible to run VB scripts from the server. Would that be the recommended solution? It does solve my problem, but I'm just wondering if it's a good idea.
This solves all my questions:
http://www.sqlservercentral.com/articles/Integration+Services+%28SSIS%29/90571/
I want to create a SQL Server SSIS package where I can watch a folder and once I have all (20 files) the required files I want to execute a sql statement. The files may come at different times and sometime they will be in csv and sometime they can come in zip. I know ssis has a wmi event watcher task but I’m not sure how I can specify to look for all 20 files. I guess I want wmi event watcher to look into that folder every 30 minutes and once it sees all the files move to the next step (execute sql task). Can someone tell me how I can specify the file name in wmi event watcher task? Thanks.
This article seems relevant to your plan. You need to create the proper WQL code.
http://blogs.technet.com/b/heyscriptingguy/archive/2007/05/22/how-can-i-monitor-the-number-of-files-in-a-folder.aspx
("ASSOCIATORS OF {Win32_Directory.Name='C:\Logs'} Where " _
& "ResultClass = CIM_DataFile")
I'm not sure how that will behave in the WMI Event watcher though. Have you looked at the docs for the SSIS task?
Here is a more step-by-step approach:
http://microsoft-ssis.blogspot.com/2010/12/continuously-watching-files-with-wmi.html
Some good points there, even if it doesn't address the pesky 20 file requirement.
You could also have a powershell script on the server monitor the files and then chuck them into a subfolder when they are all there, which SSIS would be monitoring.
Here is a doc page showing how to specify one file:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa394594(v=vs.85).aspx
With that, I'm sure you could set up a chain of WMI checks in your SSIS package.
I have a DTS (not SSIS) package that hasn't been touched in years that I had to update a query with. When I run the package by manually executing each step in the editor, everything works out fine and generates a file of a couple thousand records as expected. When I hit the "Execute" button at the top of the editor to run the whole package, it doesn't error but the file is generated with only 1 record.
All tasks inside of the package are either transformation steps or Sql Tasks. There are not any ActiveX script tasks. When I watch the process as it's running the steps by itself, the execution is following the mapping correctly.
I'm at a loss on this one. Has anyone seen this issue before or have any idea where to start?
I just ran into a similar issue recently. While working with the senior DBA, we found that the server where the package ran did not have the right permissions to a directory on the network. The package ran fine in my box, but died on the production server. We need to give permissions to the sqlservice account on the production box, to write to the directory on the network.
You might also want to check out any ActiveX Script step that changes the connection string or destination of Data Pump steps. I've had cases where these were different on the destination server that the DTS packages run.
After going through all of the lines of all of the stored procedures and straight sql tasks used in the package, I located a SET ROWCOUNT 1 that was never reset. While I was manually executing each step separately, the RowCount would be automatically reset; however, when it was run as a complete package, the RowCount was never reset. Adding SET ROWCOUNT 0 at the end of the particular script resolved this issue.