filter filename from folder that contains a specific text in ssis - sql-server

I am trying to move some files from one folder to another, and i Need to evaluate if a file name contains certain text then only we need to move those files.
for example , i have following files in a folder.
abcd_takeme_fdsljker.txt
abcd_file_fdsljker.txt
abcd_takeme_fdsljsdfker.txt
abcd_filetk_fdsljker.txt
abcd_takeme_fdsljssker.txt
from the above I want to pick files which has text "takeme"

Using For each loop container
You have to add a for-each loop container to loop over files in a specific directory.
Choose the follow expression as a filename:
*takeme*
Map the filename to a variable
Add a dataflow task inside the for each loop to transfer files
use the filename variable as a source
you can follow the detailed article at:
http://www.sqlis.com/sqlis/post/Looping-over-files-with-the-Foreach-Loop.aspx
if you want to add multiple filter follow my answer at:
How to add multiple file extensions to the Files: input field in the Foreach loop container SSIS
Using a script task
or you can achieve this using a script task with a similar code: (i used VB.Net)
Public Sub Main()
For Each strFile As String In IO.Directory.GetFiles("C:\New Folder\", "*takeme*", IO.SearchOption.AllDirectories)
Dim filename As String = IO.Path.GetFileName(strFile)
IO.File.Copy(strFile, "D:\New Folder\" & filename)
Next
Dts.TaskResult = ScriptResults.Success
End Sub

Related

SSIS Extract links from Excel cells to load into SQL

The problem:
I have an SSIS package that loops through 100+ Excel files and reads the data, then copies the contents over to a SQL Server Table. In these Excel files, this one column has hyperlinks. The column text itself says something like DSH-LN-4, but clicking on it in Excel opens up a folder that contains some images. How do I copy the underlying link in this column rather than the actual text in the cells?
What have I tried so far:
I haven't really tried anything because I found absolutely no resources on how to do this in SSIS. Manually adding a column to the Excel files is NOT possible, since there are 100's of files. The only resource I found was in this SO Question, but this does not indicate the process of doing this without manually manipulating the Excel files.
What I would like:
In my ForEach loop container, I have a data flow task that gets the Excel contents and shoves it into the SQL Table. The column that contains hyperlinks is called PhotoReference (since these hyperlinks open the folder that has the photos). I would like this PhotoReference column to copy over the underlying hyperlink of the cell and add that to the SQL column.
For instance, I want the PhotoReference column to contain this:
www.companyname.box.com/asjdfbgkjb134kjbsdafo2bm21n4bk
If I can manage to do this, my Power BI report running off of this underlying data could contain a clickable text that would open the image directly.
Any help would be appreciated.
UPDATE:
I was able to try two different methods to extract the hyperlinks from my column, but each of these have their own issues:
Method 1: I added a Script Task component to my ForEach container and as I loop through each Excel file, used Microsoft.Office.Interop.Excel.Hyperlinks assembly to get the hyperlink from my Excel column. BUT, I don't know what to do with it after. I figured the only thing to do is to overwrite the Excel columns' content with my extracted hyperlink, but I really rather not change my Excel files in any manner.
Method 2: I added a Script Component object inside my data flow task in between my Excel source and SQL Destination. In this method, I could not get nearly as far because the Input0_ProcessInputRow method that is auto-generated has the argument Row of type Input0Buffer. I am not able to apply any Microsoft.Office.Interop.Excel properties to my Input0Buffer object. So I am stuck.
If you have to right to alter the excel files, you can simply add a Script Task before the data flow task to replace the URL column value with the hyperlink.
In this answer, I will provide a step-by-step solution to solve this problem:
Creating Excel samples
First of all, I created some Excel files with the following columns:
First name (text)
Last name (text)
Age (number)
Photo (hyperlink)
The file content looks like the following:
Creating the SSIS package
First of all, You must add an Excel connection manager that link to one of the Excel files you need to import. And an OLE DB connection manager to connect to the SQL Server instance.
You must add a SSIS variable of type string, to store the Excel file path when using the foreach enumerator
Add a Foreach loop container and configure it to loop over the Excel files as mentioned in the images below:
Within the Foreach Loop container add a Script Task and a Data flow task as mentioned in the image below:
Now, Open the data flow task and add an Excel source and an OLE DB destination and configure the columns mapping between them.
Open the Script Task configuration, and select the ExcelFilePath variable (created in step 2) as a readonly variable as mentioned in the image below:
Now, open the Script editor and in the solution explorer window, right-click on the references icon and click on "Add Reference..."
When the Add reference catalog appears, click on the COM tab, and search for Excel, then you should select the Excel Object Library from the results as shown in the following image:
Also, make sure to add Microsoft.CSharp.dll reference.
On the top of the script you should add the following line:
using Excel = Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;
In the Main() function add the following lines:
Excel.Application excel = new Excel.Application();
string originalPath = Dts.Variables["User::ExcelFilePath"].Value.ToString();
Excel.Workbook workbook = excel.Workbooks.Open(originalPath);
Excel.Worksheet worksheet = (Excel.Worksheet)workbook.Worksheets[1];
Excel.Range usedRange = worksheet.UsedRange;
int intURLColidx = 0;
excel.Visible = false;
excel.DisplayAlerts = false;
for (int i = 1; i <= usedRange.Columns.Count; i++)
{
if ((worksheet.Cells[1, i] as Excel.Range).Value != null &&
(string)(worksheet.Cells[1, i] as Excel.Range).Value == "Photo")
{
intURLColidx = i;
break;
}
}
for (int i = 2; i <= usedRange.Rows.Count; i++)
{
if ((worksheet.Cells[i, intURLColidx] as Excel.Range).Hyperlinks.Count > 0)
{
(worksheet.Cells[i, intURLColidx] as Excel.Range).Value2 = (worksheet.Cells[i, intURLColidx] as Excel.Range).Hyperlinks.Item[1].Address.ToString();
}
}
workbook.Save();
Marshal.FinalReleaseComObject(worksheet);
workbook.Close(Type.Missing, Type.Missing, Type.Missing);
Marshal.FinalReleaseComObject(workbook);
excel.Quit();
Marshal.FinalReleaseComObject(excel);
Dts.TaskResult = (int)ScriptResults.Success;
In the lines above, first we searched for the column index that contains the hyperlink (in this example the column name is "Photo", then we will check for each line if the Hyperlink address is not empty we will replace the column value with this hyperlink address)
Finally, make sure to configure the Excel connection manager to read the file path from the created variable value (Step 2) using expressions:
Experiments
After running the package, if we open an Excel file we will see that the Cell value is replaced with the URL:
And as shown in the image below, data are imported successfully to SQL Server:
References
Missing compiler required member 'microsoft.csharp.runtimebinder.binder.convert'
Extracting a URL from hyperlinked text in Excel cell
Excel interop prevent showing password dialog
What you will probably need to do is some hackery involving the Excel COM API, or macros. In fact, since you should stay away from using the Office COM API in SSIS.
You could pre-process excel to take that value with non-standard operations in SSIS, like using script component.
These are the steps you need to follow to import that data using the Script component:
Drag and drop a script component and select "source" as the script option type.
By default the script language is Microsoft Visual C# 2008 and I have done this sample with Microsoft Visual Basic 2008. Change this if you need to.
Define your output columns with the correct data type in "data type properties"
Edit the script. In the IDE you should add reference:
Microsoft.Excel 11.0 Object Library
(if that reference doesn´t work, try with Microsoft.Excel 5.0 Object Library)
Finally, write some code:
Imports Microsoft.Office.Interop.Excel
Public Overrides Sub getHyperlink()
Dim oExcel As Object = CreateObject("Excel.Application")
Dim FileName As String
FileName = Variables.FileName
Dim oBook As Object = oExcel.Workbooks.Open(FileName)
Dim oSheet As Object = oBook.Worksheets(1)
Output0Buffer.AddRow()
// change A1 with your correct col & row
Output0Buffer.Address = cell.range("A1").Hyperlinks(1).Address & "#" & cell.range("A1").Hyperlinks(1).SubAddress
End Sub
(keep in mind that it is a code that may not run, it is by way of illustration)
You could see code in C# here:
C# Script in SSIS Script Task to convert Excel Column in "Text" Format to "General"
The only issue with the script method is you need to have the Excel
runtime installed.
More about script component here:
https://www.tutorialgateway.org/ssis-script-component-as-transformation/

Managing links in Excel source workbooks when duplicating source files

I have three files:
Activefile - where my code is stored and run
Databasefile - where my raw data is housed (has lots of protection)
Copyofdatabasefile - is a copy without the protection
I have a macro that runs in activefile to update a databasefile excel file, later in the macro, I then use the saveas method on the databasefile to make a copyofdatabasefile file, I remove some functionality to allow people to access the data easily without going through some of the checks on the main databasefile.
When saving the copyofdatabasefile, the links in my active file are updated to look at the new copyofdatabasefile file. I don't want this to happen.
How can I adjust my excel links/code to ensure that the links in my file aren't transferred across to the copyofdatabasefile?
Saveas macro options are currently:
Databasefile.SaveAs filename:="\\somelocation\copyofdatabasefile.xlsx", FileFormat:=51, CreateBackup:=False
Using the Workbook.SaveCopyAs Method should work.
If your original file is xlsx use
Databasefile.SaveCopyAs Filename:="\\somelocation\copyofdatabasefile.xlsx"
Note that it saves in the same FileFormat as the original file only!
If your original file is xlsm use
If you need to change the file format (eg from xlsm to xlsx) you need to save as copy in the original file format first, then reopen that copy with Workbooks.Open() and then use .SaveAs to change the FileFormat.
Databasefile.SaveCopyAs Filename:="\\somelocation\copyofdatabasefile.xlsm" 'if original file was xlsm
Dim wb As Workbook
Set wb = Workbooks.Open("\\somelocation\copyofdatabasefile.xlsm")
wb.SaveAs filename:="\\somelocation\copyofdatabasefile.xlsx", FileFormat:=51, CreateBackup:=False
wb.Close False
Kill "\\somelocation\copyofdatabasefile.xlsm" 'delete old format

WildCards in SSIS Collection {not include} name xlsx

I have a process built in SSIS that loops through Excel files and Import data only from those that include name Report.
My UserVariable used as Expression is: *Report*.xlsx
and it works perfectly fine. Now I am trying to build similar loop but only for files that DOES NOT include Report in file name.
Something like *<>Report*.xlsx
Is it possible?
Thanks for help!
Matt
In your loop, put a Script task before your first task. Connect those two with a line. Right click that line and set Constraint Options to expression. Your expression would look like this...
FINDSTRING(#var, "Report", 1) == 0
Where #var is the loop iterable.
Only files without "Report" inside will proceed to the next step.
Referencing this exact answer. SSIS Exclude certain files in Foreach Loop Container
Unfortunately, you cannot achieve this using SSIS expression (something like *[^...]*.xlsx) you have to search for some workarounds:
Workarounds
First
Get List of - filtered - files using an Execute Script Task before entering Loop and loop over then using ForEach Loop container (Ado enumerator)
You have to a a SSIS variable (ex: User::FilesList) with type System.Object (Scope: Package)
Add an Execute Script Task befor the for each Loop container and add User::FilesList as a ReadWrite Variable
In the Script Write The following Code:
Imports System.Linq
Imports System.IO
Imports System.Collections.Generic
Public Sub Main()
Dim lstFiles As New List(Of String)
lstFiles.AddRange(Directory.GetFiles("C:\Temp", "*.xlsx", SearchOption.TopDirectoryOnly).Where(Function(x) Not x.Contains("Report")).ToList)
Dts.Variables.Item("FilesList").Value = lstFiles
Dts.TaskResult = ScriptResults.Success
End Sub
In the For each Loop Container Choose the Enumertaion Type as 'From variable Enumerator' and choose FilesList variable as a source
ScreenShots
Second
Inside the for each loop add an Expression Task to check if the file contains Report string or not
Add a variable of type System.Boolean (Name: ExcludeFile)
Inside the ForEach Loop Container add an Expression Task component before the DataFlowTask you that imports the Excel File
Inside The Expression Task write the following:
#[User::ExcludeFile] = (FINDSTRING(#[User::XlsxFile], "Report", 1 ) == 0)
Double Click on the connector between the expression task and the DataFlowTask and write the following expression
#[User::ExcludeFile] == False
Note: It is not necessary to use an Expression Task to validate this you can use a Dummy DataFlowTask or a Script Task to check if the filename contains the Keyword you want to exclude or not

How do I pick the most recently created folder using Foreach loop container in SSIS package?

I've got an interesting challenge with SSIS. Using a for-each file enumerator, I need to pick the subfolder which has been most recently created, and then iterate through each of the files.
Perhaps an example would explain better. The folders look something like this:
c:\data\2011-0703
c:\data\2011-0626
c:\data\2011-0619
How could you get a for each file enumerator to pick the most recent folder? This could either be by looking at the creation date, or comparing the file names.
I'm guessing it would be done with an expression in the enumerator, just can't work out how! Couldn't find anything on the net either.
Thanks
Here is one possible option that you can achieve this with the help of Script Task. Following example shows how this can be done. The example was created in SSIS 2008 R2.
Step-by-step process:
Create three folders named 2011-0619, 2011-0626 and 2011-0703 in the folder path C:\temp\ as shown in screenshot #1. Make note of the Date created value of each of the folders.
Place few files in each of the folders as shown in screenshots #2 - #4.
On the SSIS package, create four variables as shown in screenshot #5. Set the variable RootFolder with value C:\temp\ (in your case this will be c:\data). Set the variable FilePattern with value *.*. Variable RecentFolder will be assigned with the recent folder path in the Script Task. To avoid design time errors, assign the variable RecentFolder with a valid file path. Variable FilePath will be assigned with values when the files are looped through in the recent folder.
On the SSIS package, place a Script Task. Replace the Main() method within the Script Task with the script task code give under section Script task code (Get recent folder):. This script gets the list of folders in the root folder and loops through to check the creation datetime to pick the most recently created folder. The recently created folder path is then stored in the variable RecentFolder.
On the SSIS package, place a Foreach Loop container and configure it as shown in screenshots #6 and #7.
Place a Script Task inside the Foreach Loop container. Replace the Main() method within the Script Task with the script task code give under section Script task code (Display file names):. This script simply displays the names of files within the recently created folder.
Once all tasks are configured, the package should look like as shown in screenshot #8.
Screenshots #9 - #11 show that the package displays the file names in the recently created folder 2011-0703.
Hope that helps.
Script task code (Get recent folder):
C# code that can be used only in SSIS 2008 and above.
public void Main()
{
Variables varCollection = null;
Dts.VariableDispenser.LockForRead("User::RootFolder");
Dts.VariableDispenser.LockForWrite("User::RecentFolder");
Dts.VariableDispenser.GetVariables(ref varCollection);
string rootFolder = varCollection["User::RootFolder"].Value.ToString();
DateTime previousFolderTime = DateTime.MinValue;
string recentFolder = string.Empty;
foreach (string subFolder in System.IO.Directory.GetDirectories(rootFolder))
{
DateTime currentFolderTime = System.IO.Directory.GetCreationTime(subFolder);
if (previousFolderTime == DateTime.MinValue || previousFolderTime <= currentFolderTime)
{
previousFolderTime = currentFolderTime;
recentFolder = subFolder;
}
}
varCollection["User::RecentFolder"].Value = recentFolder;
Dts.TaskResult = (int)ScriptResults.Success;
}
Script task code (Display file names):
C# code that can be used only in SSIS 2008 and above.
public void Main()
{
Variables varCollection = null;
Dts.VariableDispenser.LockForRead("User::FilePath");
Dts.VariableDispenser.GetVariables(ref varCollection);
MessageBox.Show(varCollection["User::FilePath"].Value.ToString(), "File Path");
Dts.TaskResult = (int)ScriptResults.Success;
}
Screenshot #1:
Screenshot #2:
Screenshot #3:
Screenshot #4:
Screenshot #5:
Screenshot #6:
Screenshot #7:
Screenshot #8:
Screenshot #9:
Screenshot #10:
Screenshot #11:
Iterate through the folders. Save the name of the first one. Compare that saved value to the name of each subsequent folder. If the next folder is more recent, swap that name in and keep going. At the end, your saved value will be the name of the most recent folder (if you're comparing creation dates, you'll need to save both the folder name and the creation date).
You can then use the saved value as an argument to your second iteration loop.

Any way to import multiple (csv) files to an Access db

I have multiple csv files with the same scheme, and I want to import them in one step. A solution could be to use the "import wizard", but I can only import one file with it. Oh, and it would be the best to work in msaccess2003. THX
The simplest solution is to start a dos-prompt, change to the directory where you have your files, and type:
type *.csv > allfiles.txt
If you do this often, you can create a batch-file that you can double-click from your desktop.
You can write a small program for importing see http://www.javaworld.com/javaworld/javaqa/2000-09/03-qa-0922-access.html for java JDBC conector to msaccess and since the import file is csv you can do this in no time...
There are other importing options for other languages
If all you want to do is drive the import with a list of files, you don't need a batch file. You can get the list of files using Dir():
Dim strCSVFileName As String
strCSVFileName = Dir("*.csv")
Do Until strCSVFileName = vbNullString
[import strCSVFileName]
strCSVFileName = Dir()
Loop
Of course, this assumes you're doing the import from within Access, but given your tags, that's the logical inference of your question.
This is an old thread, but it turned up when I searched for the issue. Hopefully this code helps someone address the same challenge. Builds / expands on the example David-W-Fenton offers, above.
I imported a file first, using the Wizard. Imported into a table named "bestTranscripts" and saved the import template as "BestImport" -- then used those values in the TransferText command.
Function ImportFiles()
On Error Resume Next
Dim cnn As New ADODB.Connection
Dim targetSet As New ADODB.Recordset
Dim sourceDirectoryName As String
Dim sourceFileName As String
sourceDirectoryName = "<path containing files>"
sourceFileName = Dir(sourceDirectoryName & "\*.txt")
Do Until sourceFileName = vbNullString
DoCmd.TransferText acImportDelim, "BestImport", "bestTranscripts", sourceFileName
sourceFileName = Dir()
Loop
End Function

Resources