I created an SSIS package that extracts two files from a .zip file, imports data from them and then attempts to delete the files that were extracted.
The package works, data is imported and all tasks report success. However the File System Tasks that attempt to delete the files don't delete them, even though they report success.
I use the CozyRoc Zip Task to extract the files. When I remove the Zip Task, the File System Tasks actually do delete the files. I'm not certain that CozyRoc is causing the problem, but the existence of that task may be causing other issues with the package.
Can anyone help me figure out how to reliably delete the files?
Do I need to put in some sort of pause after the Data Flow Tasks to allow them to release whatever locks they might have on the files?
Is there a way to view the DOS commands that the File System tasks use at run time, to verify that they are actually attempting to delete the correct files?
Thank You,
Robbie
Control Flow:
Details:
Visual Studio 2019 v16.11.3
File Names are from Flat File Connection Managers (See image below).
Flat File Connection Managers use Expressions to set their connection strings.
The same connection managers are used to import the data, so I presume that they accurately refer to the correct files and their correct locations.
File System Task Editor for one of the delete tasks:
Related
i already create a SSIS package on SSMS and scheduled in Server Agent, it create succesfully my '.csv' file but i want to overwrite the file every night when the scheduled job start again.
In your SSIS package, in the Flat File Destination, make sure the box Overwrite data in the file is checked. When the package runs each night, the file will be overwritten with new data.
Use a File System Task in your Control Flow.
The File System task performs operations on files and directories in
the file system. For example, by using the File System task, a package
can create, move, or delete directories and files. You can also use
the File System task to set attributes on files and directories. For
example, the File System task can make files hidden or read-only.
I have an SSIS package that picks up a file from a directory and imports it into a database, pretty straightforward stuff. The only issue is the input file is named based on the current date, e.g \path\to\file\filename_010115.txt, where 010115 is for January 1, 2015.
I was calling this package with a custom bat file that set the connection manager for a Flat File Source to the current date formatted filename and passed it into dtexec.exe directly; however, our environment demands that we use XML configuration files and not bat files. Is there a simple way to set the file source "prefix" in the xml and append the date-formatted filename before attempting to pick up the file?
Have you considered rearchitecting your approach? You're not going to be able to gracefully do this with a Configuration - be it XML, table, environment, or registry key. In the simplest case, a table, before you could even start your SSIS package, a process would need to update that table to use the current date. A scheduling tool like SQL Agent can run SQL Commands. If you're going the XML route though, you're looking at a tiny little app or PowerShell command to modify your configuration file. Again, that's going to have to modify your file every day and before the SSIS package begins as it sets values once in the beginning and never consults the configuration source again.
You could use an Expression in SSIS to use the current date as part of your flat file source's connection string but in the event you need to process more than one day's file or reprocess yesterday's file, you're humped and either have to manually rename source files or change the system clock and nobody's going to do that.
A more canonical approach would be to use a Foreach (file) Loop Container. Point it at the source folder and find all the file(s) that match your pattern. I'm assuming here you move processed files out of the same folder. The Foreach Container finds all matching files and enumerates through the list popping the current one into whatever variable you've chosen.
See a working example on
https://stackoverflow.com/a/19957728/181965
I'm posting this rather odd issue here in the remote chance that someone has come across this before, or possibly just has an idea or two about what I could try or check next because I'm stumped.
Summary: SQL 2008 SSIS package tasks that attempt to create files with .zip extension fail with
"Access to the path is denied"
Detail: This first occurred in a test environment with a package that works fine in Dev (and Prod). The part that makes this problem odd is that it is all about the File Extension, not security. I mention this now to curb replies about checking the security (SSIS Account, Directory Level permissions etc.) :- it's not that, 100%.
So, I've built an SSIS package as a proof of behavior, that takes 3 files (a.txt, b.txt, c.txt) and respectively for
(a) uses CozyRoc Zip to Create a Zip,
(b) uses a script task to create a .zip (using GZipStream - I know this creates a GZIP not a ZIP but bear with me...) and
(c) native SSIS File System Task copies the file from c.txt to c.zip (yes, creating a .zip file that is not really a zip file).
All Three fail with the above message - the .ZIP files are created for (a) and (b), but remain at 0 length. (For (c) just the error message).
Now, I edit the SSIS package and change the extensions of the destinations (to .ZOP or .ZIP2 or .GZ or .ANYTHING), and all 3 work perfectly. And this is obviously how I know that it's the .ZIP extension not a "normal" security issue.
So I've initially assumed this is a one-off on this test server because it was the only place it happens, but I've found another box (build rehearsal) on which exactly the same problem exists. I've tried associating .ZIP with various different programs (Windows Explorer, WinZip, 7Zip, WinRar & "no program") and nothing works, and I've googled the problem to death with no luck yet.
I've tried creating .ZIP files with the various installed archive programs using their GUIs and they all work fine. Existing .ZIP files can be unzipped using CozyRoc. Existing .GZ (GZIP) files renamed to .ZIP can be unzipped using the script GZipStream decompress. And I can rename files to and from .ZIP using SSIS or Explorer/CMD. It's just SSIS (specifically SSIS) creating a file with extension .ZIP (specifically .ZIP) throws this error.
I'm starting to suspect it might have something to do with SSIS thinking that .ZIP is an archive "folder" not a ".ZIP File" but I don't know where to go with this idea, proving it or fixing it.
Any ideas at all? - at my wits end!!
Thanks in advance
P.S. The "obvious" answer of using .ZIP2 and renaming is not an option, there are (literally) hundreds of packages running in production that create .ZIP files and packages need to move from Test to Prod without modification. I really need a solution, not a workaround, in this instance if there is one.
This turned out to be a RedGate tool (HyperBac) having a file association with .ZIP extension files (amongst others). Hyperbac's monitoring of .ZIP files appears to have clashed with SSIS's attempt to write to the .ZIP file, as procmon reported shared file access violations, causing a spurious ACCESS DENIED error to be reported by the package.
Since use of the tool is necessary on our environments, I was able to solve the problem by deleting the .ZIP association using the GUI ("Hyperbac Configuration Manager" > "Extensions" > Ext=.ZIP, Delete)
I have some csv files. I want to write SQL Server script to read the file at certain period and insert in SQL Server db, if record is not found and ignore it if file has already been read previously by scheduler. Each csv will contain one record only.
Like:
1.csv => John,2000,2012/12/12
2.csv => Tom,3000,2012/12/11
It will be great if someone can provide examples of script.
Thanks!
If I was you I would create an SSIS package that uses the multi file input. This input let's you pull data from every file in a directory.
Here are the basic steps for your SSIS package.
Check if there are any files in the "working" directory. If not end the package.
Move every file from your "working" directory to a "staging" directory.
You will do this so that if additional files appear in your "working" directory while you are in the midst of the package you won't lose them.
Read all of the files in the "staging" directory. Use a data flow with the multi file input.
Once the reading has been completed then move all of the files to a
"backup" directory.
This of course assumes you want to keep them for some reason. You could just as easily delete them from the "staging" directory.
Once you have your package completed then schedule it using SQL Server agent to run the package at whatever interval you are interested in.
Background:
I've a folder that gets pumped with files continuously. My SSIS package needs to process the files and delete them. The SSIS package is scheduled to run once every minute. I'm picking up the files in ascending order of file creation time. I'm building an array of files and then processing-deleting them one at a time.
Problem:
If an instance of my package takes longer than one minute to run, the next instance of the SSIS package will pick up some of the files the previous instance has in its buffer. By the time the second instance of teh package gets around to processing a file, it may already have been deleted by the first instance, creating an exception condition.
I was wondering whether there was a way to avoid the exception condition.
Thanks.
How are you scheduling the job? If you are using the SQL Server Job Scheduler I'm under the impression is should not re-run a job already running; see this SO question: Will a SQL Server Job skip a scheduled run if it is already running?
Alternatively, rather than trying to move the file around you could build a step of your job to test if it is already running. I've not done this myself but it would appear to be possible, have a read of this article Detecting The State of a SQL Server Agent Job
Can you check for the existance of the file before you delete it.
File.Exists(filepathandname)
To make sure your package are not messing up with the same files, you could just create an empty file called just like the filename but with another extension (like mydata.csv.being_processes) and make sure your Data Flow Task is running only on files that don't have such file.
This acts as a lock.
Of course you could change the way you're scheduling your jobs but often - when we encounter such an issue - it's because we got no leverage on those things :)
You can create a "lock file" to prevent parallel execution of packages. To protect yourself from case with crashed package consider using file's create date to emulate lock timeout.
I.e.: at the beginning of package you will check for existence of lock file. If it's non-existent OR it was created more than X hours ago - then continue with import. Otherwise exit.
I have a similar situation. What you do is have your SSIS package read all files within a folder and create a work file like 'process.txt'. This will create a list of valid files at that point in time. If you have multiple packages, then create the file with a name like 'process_.txt'. The package will only process the files named in their process file. This will prevent overlap.