I use SSIS to read .txt files in input and execute my business logic over them saving the output results in a file whose name is the same as the current inpout file (file name dynamically stored in a variable).
When all the files are stored in the same folder, I have no problem accessing them since I use the following expression for the flat file connection string in the data flow: "path" + #[User::inputFileName] + ".txt"
Now I have to process a folder with subfolders (I set traverse subfolders in the foreach loop) and I have some issues with the flat file connection string since I cannot use a wildcard like: my path\\subfolder*" + #[User::inputFileName] + ".txt" where every subfolder has same name and changes only the last portion of the name.
How can I save the current subfolder name in a variable so that I can use it in the following way? "path\\"+ #[User::currentSubFolder] +"\\" + #[User::inputFileName] + ".txt"
I was able to solve my issue, therefore I write here my solution in the case someone else would be in the same situation.
I used a script transformation block before my foreach loop. From it I can retrieve the current full path (used afterwards in the Flat File connection string) and the input file name without extension to be used as output file name containing the results of the SSIS scripts.
In order to keep the values of interests I used 2 variables: one for the file name and one for the path.
Here the script code:
Public Sub Main()
'Variable Index 0 => FileName
'Variable Index 1 => filePath
Dim fullPath As String = Dts.Variables.Item(1).Value.ToString
Dim fileName As String = Path.GetFileName(fullPath)
fileName = fileName.Substring(0, fileName.Length - 4)
Dts.Variables.Item(0).Value = fileName
Dim x As String = Dts.Variables.Item(0).Value.ToString
Dts.TaskResult = Dts.Results.Success
End Sub
Related
I'm working on an SSIS package where I need to read several different CSV files in order to insert their data into a SQL Server Database.
There will be roughly 500 csv files, all in the same folder. They will have an ordered naming pattern like:
tFile1.csv
tFile2.csv
tFile3.csv
tFile4.csv
tFile5.csv
etc
How can I program SSIS to automatically start with tFile1.csv, then automatically do tFile2.csv then tFile3.csv etc in order?
try using a ForEach Loop Container.
If the order the files are processed in is important then note that this processes them in file name order. If the name order is not the correct processing order then it is probably easier to rename the files than to try and build a workaround to the file processing order.
For example, if you want to process the files in creation date order then rename the files to prefix them with their creation date in yyyymmdd format
This can be done by a combination of a Script Task to obtain and sort the file names and Foreach loop to load each file in order. I'm not sure what version you're using, but this worked on SSDT 2017 with no issues for me. Also note that the file extension is case sensitive and does need a period for your case (i.e. ".csv" lowercase). More details are on this below.
Create an object-type SSIS variable and if necessary string variables for the file location, prefix, and extension. Also create an empty string variable to use for the Flat File Connection Manager. If you're using an expression for the file location make sure to add a \ after the final folder and an extra \ to escape this, for example "C:\\Your Folder\\File Source Folder\\"
If you haven't already, create a Flat File Connection Manager with an expression for the ConnectionString property. This will update on each iteration of the Foreach Loop. Add an expression that puts the source file location variable together with the current file name.
Then add a Script Task (the example uses C#) on the control flow with the location, prefix, and extension variables in the ReadOnlyVariables pane and the object variable in the ReadWriteVariables field. Don't forget to add the references from the using statements in the Script Task as well. More details on the code is in the example.
Add a Foreach Loop with the Foreach ADO Enumerator and the object variable as the ADO Object Source Variable.
On the Variable Mappings pane add the empty string variable for the current file name at Index 0.
Inside the Foreach Loop create a Data Flow Task the loads the necessary destination object from the Flat File Connection Manager.
Flat File Connection Manager Expression:
#[User::FileLocation] + #[User::CurrentFile]
C# Script Task:
using System;
using System.IO;
using System.Data.OleDb;
using System.Data;
using System.Windows.Forms;
using System.Text.RegularExpressions;
//Add these as ReadOnlyVariables in the Script Task
string fileLocation = Dts.Variables["User::FileLocation"].Value.ToString();
string filePrefix = Dts.Variables["User::FilePrefix"].Value.ToString();
//make sure to use the . in ".csv" for the detension
string fileExt = Dts.Variables["User::FileExt"].Value.ToString();
DataTable preSortDT = new DataTable();
OleDbDataAdapter adapter = new OleDbDataAdapter();
preSortDT.Columns.Add("FileName", typeof(string));
preSortDT.Columns.Add("FileNumber", typeof(int));
DirectoryInfo sourceDirectoryInfo = new DirectoryInfo(fileLocation);
foreach (FileInfo fi in sourceDirectoryInfo.EnumerateFiles())
{
if (fi.Name.ToLower().StartsWith(filePrefix.ToLower()) && fi.Extension == fileExt)
{
//regex to get last numeric digits before final . (i.e. .csv)
int fileNumber =
Convert.ToInt32(Regex.Match(fi.Name.Substring(0, fi.Name.LastIndexOf('.')), #"\d+$").Value.ToString());
preSortDT.Rows.Add(fi.Name, fileNumber);
}
}
//create DataView for sort of records
DataView preSortDV = preSortDT.DefaultView;
preSortDV.Sort = "FileNumber asc";
//create final data table to hold sorted records
DataTable postSortDT = preSortDV.ToTable();
DataSet postSortDS = new DataSet();
postSortDS.Tables.Add(postSortDT);
//Add object variable as a ReadWriteVariable and populate via sorted data set
Dts.Variables["User::FileNames"].Value = postSortDS;
i have a folder containing multiple excel file. Excel files name are almost same except every file name contain month and year number in last.
Example
Emp_04_2017.xlsx
Emp_05_2017.xlsx
...
I want to create a SSIS package that pick the current month file and insert it into the destination table.
One way would be to create SSIS variables to store the current month and year, and then use those to construct the name of the file in a third variable.
Use a ForEachLoopContainer.
ForEach Loop Container will pick files from the FolderPath variable and return complete CompletePath (Path+fileName). Loop will iterate through all the files in the FolderPath location.
Foreach Loop Container: Double click -> In Collection set Expression Directory = FolderPath, Enumerator Configuration -> Files : (.xlsx).
Vaiable Mappings -> Variable (CompletePath) Index 0.
EXPR_GetFirstOcrDash: Expression used to get first occurrence of dash in filename, #[User::FirstOcr] = FINDSTRING(REVERSE(#[User::CompletePath]), "_", 1).
EXPR_ExtractFileName: The expression is used to get month from the file name, #[User::FileMonth] = (REVERSE(SUBSTRING(REVERSE(#[User::CompletePath]), #[User::FirstOcr]+2, 1)) == "0" ?
REVERSE(SUBSTRING(REVERSE(#[User::CompletePath]), #[User::FirstOcr]+1, 1))
:REVERSE(SUBSTRING(REVERSE(#[User::CompletePath]), #[User::FirstOcr]+1, 2)))
EXPR_SetFileToProcess: Used to set the file which we found for processing, #[User::FileToProcess] = #[User::CompletePath]
EXPR_StopProcessing: The loop will continously check all files in the folder, when we found first file with month of current date, we will not further look for file.
A better practice could be use two directory Source and Archive, once file is processed move the processed file to Archive directory using FileSystemTask.
Precedence Constraints are added on the green arrows.
After the Foreach Loop Container gets processed, you can use FileToProcess variable and use the file in the DataFlowTask.
This question already has answers here:
Import most recent csv file to sql server in ssis
(2 answers)
Closed 6 years ago.
I have a remote folder from where i pick the multiple files and loop through for each loop container. But I want to pick the first file first based on time stamp from that folder.
how do I do this in SSIS?
First you hace to create 2 Variables
FolderPath (string) -- to store the folder you have to manipulate
dtFiles (Object) -- to store files from this folder
Add a script task and select FolderPath as ReadOnlyVariable and dtFiles as ReadWrite Variable
In the script write the following code
Imports System.Collections.Generic
Imports System.Linq
Public Sub Main()
Dim strFolderPath As String = Dts.Variables.Item("FolderPath").Value.ToString
Dim lstFiles As New List(Of IO.FileInfo)
For Each strFile As String In IO.Directory.GetFiles(strFolderPath, "*.*", IO.SearchOption.AllDirectories)
Dim fi As New IO.FileInfo(strFile)
lstFiles.Add(fi)
Next
Dts.Variables.Item("dtFiles").Value = lstFiles.OrderBy(Function(x) x.CreationTime).Select(Function(x) x.FullName).ToList
End Sub
Connect your script task to The For each loop Container
Double Click on the ForEach Loop container and change the enumerator type to ADO enumerator and choose the variable dtFiles as a source (in the collection tab) and choose the enumeration mode (Rows from first table)
In the variable mapping tab (in For each loop container) map the index 0 to a new Variable i.e. FileName (You can use it to do your work)
Note: i used sorted files using CreationTime. You can even use LastAccessTime and LastWriteTime properties
Just add a value to FolderPath variable and Execute
I have static table that has the has something like this:
xy_Jan10
yz_Feb11
xx_March14
by_Aug09
etc.
these names are static and they are stored in a table. I am using ForEachLoop container, so first i am reading and saving all the static names that i mentioned above into an system object variable. Next i am using the ForeachLoop containter and looping through for each of the file name and saving it into another string variable called strFileName. So in my for each loop container, i have script task that checks first if the file exists and here is where i have the problem, for each file name that comes to the variable i want to check if that file name exist firs, if exists i want to load it into my table, if not exist then i want to check the next file name, if next file name does not exist then i want to check the next variable name inline and so on. I only want to load if the variable file name matches the files on the network drive, if it is not found then i want to check next one until i go through each one in my static list names. My issue now script task stops when there is no match with the file names but i want it to go to the next variable name in the list and load it because there are a lot of other matches that are not loaded. the script task stops at the first one where it finds non much. Here is my script task:
please not the files i am loading are SAS files.
Public Sub Main()
'
' Add your code here
'
Dim directory As DirectoryInfo = New DirectoryInfo("\\840KLM\DataMart\CPTT\Cannon")
Dim file As FileInfo() = directory.GetFiles("*.sas7bdat")
If file.Length > 0 Then
Dts.Variables("User::var_IsFileExist").Value = True
Else
Dts.Variables("User::var_IsFileExist").Value = False
End If
Dts.TaskResult = ScriptResults.Success
End Sub
It looks like you need to wrap the script task inside a ForEach loop container. There's plenty of information about how to do this on the web, or even on Stack Overflow: How to loop through Excel files and load them into a database using SSIS package?
or
http://www.sqlis.com/sqlis/post/Looping-over-files-with-the-Foreach-Loop.aspx
I have four files xxxxxxCd999, xxxxCf999, xxxxC999 , xxxxD999 ... I need to move these files to their respective folders based on file name , for example file xxxxxCd999 should be moved to folder Cd999 , file xxxxCf999 should be moved to folder Cf999 ,file xxxC999 should ne moved to folder C999 so on ...
How do I achieve this in ssis ?
I have used a for each loop container, assigned some variables for sourcepath, destinationpath , and a file system task to use these variables , but im lost now n have no idea how to proceed ,
Kindly help me
Try this :-
The Foreach Loop will enumerate the source folder and the path will be stored in a variable. In the script task write a code to get the folder Name using regular expression .The script task value will be stored in another variable which will be used in File System Task
The package design will be
Create 3 variable
Name DataType Expression
FolderName string
DestLoc string "D:\\"+ #[User::FolderName]
LoopFiles string
In the above expression for DestLoc variable ,change it as per your location
ForEach Loop configuration
Change the source folder location as per the need
Script task -Add the 2 variable as below
You need to extract the folder name from the variable LoopFiles
Example
LoopFiles variable will have D:\ForLoop\SampleFolder1.txt at runtime
So in order to extract folder name from the above variable use regular expression
Open Edit Script and write the following code
List<string> filePatterns = null;
public void Main()
{
filePatterns = new List<string>();
filePatterns.Add("Folder1");
filePatterns.Add("Folder2");
string fileName = Path.GetFileNameWithoutExtension(Dts.Variables["User::LoopFiles"].Value.ToString());
Match match = Regex.Match(fileName, string.Join("|", filePatterns.ToArray()));
Dts.Variables["User::FolderName"].Value = match.Value;
Dts.TaskResult = (int)ScriptResults.Success;
}
In the above code ,you are extracting the folder name and storing it in the variable FolderName.If you have multiple folders ,then just add the folder names to the filePatterns collection variable.
File System Task Configuration