I have to scan a network drive of 120gb with over 100.000 folders. I am looking for .ini and .par files. My initial thought was to list all files from all directories and then throw out what i don't need.
I put a foreach loop with . on the whole drive, with in the loop an execute sql command where i do an insert into into a table with the full file name that was found.
I realize that writing to SQL for every record is a big performance issue, but have been unable to write it to an SSIS Object variable. It would be good to write to an In Memory table and only when the scan is finished, to push it all at once into the SQL database.
All ideas are welcome, if it's a solution to write to the SSIS object, good, if you have a better solution, very welcome.
SSIS will only be able to get a list of files on the network that exist in shared folders. Given this, you can do the following in a SSIS package to get a list of all of the files with a specific extension. The following example is based on the .ini file types. But you can easily add a second process in the same package for the .par files where the same two variables are reapplied.
Create an object variable called FileList and a string variable called File.
Create a script task to gather the .ini files where they are read from all subfolders and saved into an array. From there they are then saved into the object variable. Make certain it is defined in the ReadWrite part of the script when setting up.
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO;
namespace xxxxxx
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
string[] ini_files = Directory.GetFiles(#"\\servername\sharedfolder", "*.ini", SearchOption.AllDirectories);
foreach (string name in ini_files)
{
Dts.Variables["User::FileList"].Value += name.ToString();
}
}
}
}
Create a Foreach Loop container applying the object FileList object variable in which each item saved to it is enumerated to the File string variable. From there just include in the container a SQL script or Data Flow task to save the contents to a database table.
This is just one of many ways to approach this task. The approach here is more modular while applying a fast method of gathering the files using C#.
Based on your comment that you don't have script task option, one of the approach I think of:-
1) You will need to create batch file with "dir %1 /s /b /o:n > %2" command to get the list of required list of names into some text file, where %1 and %2 are arguments.
2) You can add two different Execute Process Task into your package where you will add your batch file as Executable for both tasks and Arguments value will be "Z:*.ini,C:\tempSSIS\iniList.txt" for one and "Z:*.par,C:\tempSSIS\parList.txt" for other task.(assuming Z:\ is your network drive and second argument is file in which you would want to store the list of file names).
3) Then, you can add Data Flow Task after each Execute Process Task to read the text files and insert records into a same or different tables.
Related
I have a dataflow that is used to do transformation of multiple flat files from given folder using for each loop container. I have a flat file again as output file. The problem is that every time I execute the the job only the last file that got transformed will be stored in destination file.
Is there a way in SSIS I can create individual transformed output file instead on overwriting on same one over and over again?
For. eg. I have 5 flat files ,test_1.txt,test_2.txt,test_3.txt ,test4_.txt
and test_5.txt in a folder.
After the job ran I can only see the data from last file test_5.txt being
transformed in my destination file.
Here's steps on a working example I tested.
Variables
I have 3 variables defined:
FileName - To be used in the foreach loop
DestinationDir - where are the files going
SourceDir - where are the files I want to process
Foreach Loop Setup
I have a foreach loop configured as:
Expression for "Directory" set to #[User::SourceDir]
Retrieve file name set to "Name and extension"
Then under the "Variable Mappings":
That means as the foreach loop is iterating over the files in the directory it will be setting the "Name and extension" of the file its on to the variable #[User:FileName]
Data Flow Task
The I add a Data Flow Task inside the foreach loop:
Then inside the DFT I have a simple Flat File Source to Flat File Destination. We'll just pass the contents of each file to new files:
During initial development I'll manually pick one file to walk through setting each of the source and destinations. Then come back and change the connection managers and set an expression on the ConnectionString.
Connection Manager Expressions
SourceFile Connection Manager:
ConnectionString gets an expression as: #[User::SourceDir] + #[User::FileName]
DestinationFile Connection Manager:
ConnectionString gets an expression as: #[User::DestinationDir] + #[User::FileName]
Testing
I have 2 test files in my source directory and no files in my destination:
After I execute my package I get success and also get new files in my destination:
There are ways to do what you are asking in SSIS with variables and expressions but there is an easier way to accomplish it using command line.
Since you are just consolidating a text files into 1 you can use a command prompt to better handle your issue:
copy *.txt output.txt
I have about a dozen folders with up to 2500 PDF's. I need to move out 163 of the PDF's from t-SQL statement into a "not to be sent" folder in SSIS.. I already have the for each loop container and file system task.. How can I only search/select the files from my T-SQL statement to be moved?
Note: I already have the filenames that need to be moved in my T-SQL statement
In your foreach loop container, which I assume is enumerating through the files, put a script task in front of your file system task.
The script task will check the current file name against your T-SQL results, either by running the query, or checking it against a variable that contains the results.
Then it will set a boolean variable to either true or false if the file should be moved, and in the precedence constraint leading to the file system task, you check the value of the boolean variable.
I think you might be asking, how do i use the TSQL statement and results from it in SSIS.
Create a variable filenames type ADO.Object
Add an execute SQL task
Add you TSQL to it
Change Result Set to Full
Map Result Set to #filenames
Add for each
Enumerator is Foreach ADO Enum
Set filenames as object
Map a new variable type string called filename to index 0
Add your file Transfer Task and use filename.
*** This assumes that filename has the full path.
How to move multiple excel files to different folders based on file name in ssis? means based on the file name it will move to respective folder.
Have you tried this?
In this you can see that you have to create a foreach loop, a script task a and a file system task to move the files to destination folder.
how to move files to different folders , based on matching filename and foldername in ssis
Using Foreach loop container
You have to add a for-each loop container to loop over files in a specific directory.
Choose the follow expression as a filename:
*takeme*
Map the filename to a variable
Add a dataflow task inside the for each loop to transfer files
use the filename variable as a source
you can follow the detailed article at:
http://www.sqlis.com/sqlis/post/Looping-over-files-with-the-Foreach-Loop.aspx
if you want to add multiple filter follow my answer at:
How to add multiple file extensions to the Files: input field in the Foreach loop container SSIS
Using a script task
or you can achieve this using a script task with a similar code: (i used VB.Net)
Public Sub Main()
For Each strFile As String In IO.Directory.GetFiles("C:\New Folder\", "*takeme*", IO.SearchOption.AllDirectories)
Dim filename As String = IO.Path.GetFileName(strFile)
IO.File.Copy(strFile, "D:\New Folder\" & filename)
Next
Dts.TaskResult = ScriptResults.Success
End Sub
I am trying to zip a Folder in SSIS, there are 12 files in the source folder and I need to zipthat folder. I can get the files to zip fine my problem is the folders.
I have to use winzip to create the zipped packages.
Can anyone point me to a good tutorial. I haven't been able to implement any of the samples that I have found.
Thanks
Adding a Script Task, yuo can use the ZipFile (class) here reference, you must refer to the System.IO.Compression.FileSystem assembly in the project (.NET Framework 4.5).
You need to provide to the Script Task the folder to be zipped and the name of the compressed folder as ReadOnlyVariables (to be added in the tab ReadOnlyVariables)
These two variables must be defined in the Variables tab (String type) of the package and can be changed dynamically through a cycle (eg. for each)
I use these two variables:
sFolderCompressed - the folder '.zip' that you want to obtain eg. C:\folder1\result.zip
sFolderSource - the source folder containing the files affected eg. C:\folder1\folder2
The script is made using c#, choose Script Language: Microsoft Visual C#
This is the code to be added in the Main method:
using System.IO.Compression;
public void Main()
{
try
{
string zipPath = (string)Dts.Variables["User::sFolderCompressed"].Value;
string startPath = (string)Dts.Variables["User::sFolderSource"].Value;
ZipFile.CreateFromDirectory(startPath, zipPath);
}
catch (Exception objException)
{
Dts.TaskResult = (int)ScriptResults.Failure;
// Log the exception
}
Dts.TaskResult = (int)ScriptResults.Success;
}
I hope can help.
Try using 7zip it is free. Take a look at 7zip command line user guide it contains all commands you need
And use a script task or an execute process task to achieve this. Also there are other useful links:
https://www.dotnetperls.com/7-zip-examples
UPDATE 1
you can follow this link for winzip:
http://www.vbforums.com/showthread.php?202918-Well-WinZip-Command-Line-Folders-to-Zip-keep-folder-structure-
In the link above they suggested using this command:
wzzip "c:\Test.zip" "c:\myfolder" -exPR
Write these things in bat file...
"C:\Program Files\WinZip\WINZIP64.EXE" -a "C:\Desktop\destination_folder\Sample.zip" "C:\Desktop\Sample"
In Execute process task:
Mention the location of bat file in Execute process Task-->Process-->Executable.
It's work fine.
I'm working on creating a csv export from a SQL Server database and I've been familiar with a process for doing so that admittedly, I've never completely understood. The process involves creating a "template" file, which defines the columns and structure for the file export. Once the "template" file exists, you can use a Data Flow task to fill it and a File System Task to copy it to the final storage destination with whatever file name you'd like (frequently a date/time stamp).
Is there a reason that you can't simply create a file directly, without the intermediate "template" file? I've looked around for a bit and it seems like all the proposed solutions involve connecting to an existing file. I see that there is a "Create File" Usage type for a "File" connection manager, but you can't use it in any File System Task. The only File System Type connection managers you can use relative to a file are "Copy", "Delete", "Move", "Rename", and "Set Attributes".
Is there a way to create a file at package run time and fill it?
The whole point of SSIS is to create a data flow with metadata so that the data can be manipulated - if you just want to go database direct to CSV you are probably better off using bcp (bulk copy program) from the command line. If you want to include it as part of a SSIS package just add an Execute Process Task and add the command line to that. You can dynamically change the included columns or the output file by adding an expression to the task. You could also call bcp though TSQL using an Excute SQL Task.
One other option is to concatenate all your columns in your query inter-spaced with a comma literal and output to a text file with just one very wide column.
For documentation on bcp look here