SSIS: import MAX(filename) from folder - sql-server

I need to pick one .csv file from \\\Share\Folder\ with max filename for further import to SQL. File name is alphanumerical, e.g. ABC_DE_FGHIJKL_MNO_PQRST_U-1234567.csv, where numerical part will vary, but I need the only max one each time the package runs.
Constraints: no write access on that SQL server, I use ##Temp table for import and this is the least desirable method for filename processing (no for each loops on this server).
Ideally it will be function/expr-based variable (combined with script task if needed) to pass into connection manager. Any ideas much appreciated.

Use a Script Task
Add a variable of type String User::CsvFile
Add a script task to your project and add your created variable as a ReadWriteVariable
In Your Script task write the following code (VB.NET):
You have to Import System.Linq Library
Public Sub Main()
Dim strDirectory As String = "C:\New Folder" ' Enter =the directory
Dim strFile As String = String.Empty
strFile = IO.Directory.GetFiles(strDirectory, "*.csv", IO.SearchOption.TopDirectoryOnly).OrderBy(Function(x) x.Length).Last
Dts.Variables.Item("CsvFile").Value = strFile
Dts.TaskResult = ScriptResults.Success
End Sub
Then use this variable from Flat File Source

Related

Import 2 Excel Files via SSIS with different sheet names

So as the title suggest, I need to do an import of 2 Excel (.xlsx) files from my local machine (c:\temp) into one SQL Server table. Each of the files only contains one sheet, but the sheet names will differ. The columnnames and no of columns on each file is identical.
If I select one specific excel file through SSIS via Excel Connection Manager, it extracts the data perfectly and inserts it into my destination SQL table.
The problem comes in when I add a ForEach Loop Container and want to loop through the c:\temp directory to read the 2 files. Somewhere I am missing a setting and keep getting various "connect to Excel" errors.
Please assist with the following:
I am unsure how to specify the Excel file path. Is the below correct? I used to select the exact file here when loading only 1 file:
Then it seems I need to create variables, so I did below:
Then I am not sure if I should add an expression to my ForEach loop and which mappings would be correct?
And lastly, I am not sure whether to put the filename or sheetname as variable below. I tried the filepath, but get the following error:
Please help as I am totally lost with this.
UPDATE
OK, I have now done the following:
Added a SheetName variable (which I think the Value is maybe incorrect). I am trying to tell it to only read the first sheet.
Then my Excel connection string looks like this:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=;Extended Properties="EXCEL 12.0 XML;HDR=NO";
My ForEach loop:
And my Excel source:
I get the following error:
[Book 2] Error: Opening a rowset for "Sheet1$" failed. Check that the object exists in the database.
It seems like your biggest issue is in regards to getting the sheetname which can vary, and the only way I know how to do this is with a script task.
So inside your foreach loop (store filepath to the Excel file) as variable, add a script task before you enter the data flow.
First of all start with knowing you connection string (I use this site for help: https://www.connectionstrings.com/excel/)
Set your read/write variable to [SheetName] and read to FilePath
Code:
var cstr = string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml; HDR = YES"";"
, Dts.Variables["FilePath"].ToString()); //Enter your connection string for OleDB conn
using (var conn = new System.Data.OleDb.OleDbConnection(cstr))
{
conn.Open();
var sheets = conn.GetOleDbSchemaTable(System.Data.OleDb.OleDbSchemaGuid.Tables, null);
//Since there is only 1 for sure.
Dts.Variables["SheetName"] = sheets.Rows[0]["TABLE_NAME"].ToString();
}
Now you have the SheetName in a variable (this will have the $ in the sheetname that you need as well), set up another variable called SQL and define it as "Select * from [" + SheetName + "]".
Now use the variable SQL in your DataFlow Source.

Passing filename from SSIS script task

I'm attempting to create a SSIS package that loads a flat file into a SQL server table.
I've been able to piece the loading functionality together. I'm currently stuck on passing the filename if it's found from the script task back to a variable where I'd like to use it in the flat file connection string.
Public Sub Main()
'
Dim di As DirectoryInfo = New DirectoryInfo("\\winshare\iFile\Cors2\AAA\AAA Employee Incentive Source Data\")
Dim fi As FileInfo() = di.GetFiles("AAA Full PreReg Report*.csv")
If fi.Length > 0 Then
Dts.Variables("User::fileExists").Value = True
Dts.Variables("User::FileName").Value = fi.name
Else
Dts.Variables("User::fileExists").Value = False
End If
' Add your code here
'
Dts.TaskResult = ScriptResults.Success
End Sub
I'm seeking help with
Dts.Variables("User::FileName").Value = fi.name
Why won't this work?
Thanks
If you are looking to get the first file in the directory then you can use the following line of code:
Dts.Variables("User::FileName").Value = fi(0).name
But If you are looking to loop over files then i recommend using the Foreach loop container to loop over files and store each file name within a variable:
SSIS - How to loop through files in folder and get path+file names and finally execute stored Procedure with parameter as Path + Filename
FAQ - How to loop through files in a specified folder, load one by one and move to archive folder using SSIS

SSIS Excel Import - Worksheet variable OR wildcard?

I have a SSIS data import package that uses a source Excel spreadsheet and then imports data into a SQL Server database table. I have been unsuccessful in automating this process because the Excel file's worksheet name is changed every day. So, I have had to manually change the worksheet name before running the import each day. As a caveat, there will never be any other worksheets.
Can I make a variable for the worksheet name?
Can I use a wildcard character rather than the worksheet name?
Would I be better off creating an Excel macro or similar to change the worksheet name before launching the import job?
I use the follow script task (C#):
System.Data.OleDb.OleDbConnection objConn;
DataTable dt;
string connStr = ""; //Use the same connection string that you have in your package
objConn = new System.Data.OleDb.OleDbConnection(ConnStr);
objConn.Open();
dt = objConn.GetOleDbSchemaTable(System.Data.OleDb.OleDbShemaGuid.Tables,null);
objConn.Close();
foreach(DataRow r in dt.Rows)
{
//for some reason there is always a duplicate sheet with underscore.
string t = r["TABLE_NAME"].ToString();
//Note if more than one sheet exist this will only capture the last one
if(t.Substring(t.Length-1)!="_")
{
Dts.Variables["YourVariable"].Value = t;
}
}
And then in SSIS, I add another variable to build my SQL.
new variable "Select * from [" + "Your Variable" + "]"
Finally set your datasource to that SQL variable in Excel Source.
This works perfectly for me with the same scenario, in case it helps you or someone else:
Required package level string variables 2:
varDirectoryList - You will use this inside SSIS for each loop variable mapping
varWorkSheet - This will hold your changing worksheet name. Since you only have 1, it's perfect.
Set up:
a. Add SSIS For Each Loop
b. Excel Connection Manager (connect to first workbook as you test, then at the end you will go to properties and add inside expression "Excel File Path" your varDirectoryList. Set DelayValidation True as well as your Excel Source task. *This will help it go through each workbook in your folder)
c. Inside your For Each Loop add a Scrip Task C#, title it "Get changing worksheet
name into variable" or your preference.
Data Flow Task with your Excel Source to SQL Table Destination.
In your Scrip Task add this code:
using System.Data.OleDb;
public void Main()
{
// store file name passed into Script Task
string WorkbookFileName = Dts.Variables["User::varDirectoryList"].Value.ToString();
// setup connection string
string connStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"EXCEL 12.0;HDR=Yes;IMEX=1;\"", WorkbookFileName);
// setup connection to Workbook
using (var conn = new OleDbConnection(connStr))
{
try
{
// connect to Workbook
conn.Open();
// get Workbook schema
using (DataTable worksheets = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null))
{
// in ReadWrite variable passed into Script Task, store third column in the first
// row of the DataTable which contains the name of the first Worksheet
Dts.Variables["User::varWorkSheet"].Value = worksheets.Rows[0][2].ToString();
//Uncomment to view first worksheet name of excel file. For testing purposes.
MessageBox.Show(Dts.Variables["User::varWorkSheet"].Value.ToString());
}
}
catch (Exception)
{
throw;
}
}
}
After you have this set up and run, you will get a message box displaying the changing worksheet names per workbooks.
If you are using Excel Source SQL Command you will need a 3rd string
variable like: varExcelSQL and inside that an expression like: SELECT
columns FROM ['varWorkSheet$'] which will dynamically change to match
each workbook. You may or may not need the single quotes, change as
needed in varExcelSQL.
If you are not using Excel Source SQL and just loading straight from
the Table; go into Excel Source Properties --> AccessMode -->
OpenRowSet from Variable --> select varWorkSheet.
That should take care of it, as long as the column structures remain the same.
If you happen to get files where it has multi data types in one column; you can use IMEX=1 inside your connection string which forces the datatypes to DT_WSTR's on import.
Hope this helps :-)
If you are using SSIS to import the sheet you could use a script task to find the name of the sheet and then change the name or whatever else you needed to do in order to make it fit the rest of your import. Here is an example of finding the sheet I found here
Dim excel As New Microsoft.Office.Interop. Excel.ApplicationClass
Dim wBook As Microsoft.Office.Interop. Excel.Workbook
Dim wSheet As Microsoft.Office.Interop. Excel.Worksheet
wBook = excel.Workbooks.Open
wSheet = wBook.ActiveSheet()
For Each wSheet In wBook.Sheets
MsgBox(wSheet.Name)
Next
On the MsgBox line is where you could change the name or report it back for another process

Make an updatable file to hold the connection string in ASP-Classic

I have an asp file which contains a connectionString to my database.
I'm wondering if there is a way to change that information before the user try to log on the system.
I named it _conn.asp and it contains the following asp code
dim conn
sub OpenConn()
Set Conn = Server.CreateObject("ADODB.Connection")
Conn.Open = "Driver=driver_name; SERVER=server_name; uid=user_name; pwd=user_pwd; DATABASE=db_name;"
End sub
Sub CloseConn()
Conn.Close
Set Conn = Nothing
End sub
I want to be able to change the driver, server, uid, pwd and database information inside that asp file.
First I realized that an xml file would be the best choice but then I heard about a bunch of security problems involving putting a connectionString on a xml file.
If it is not possible to update the _conn.asp file, what would be the best practice to make an updatable file to hold the connectionString to a database in classic ASP?
Usually you would put the connection string into a separate file in the directory not listed in IIS. Say, your ASP pages in Website\Scripts folder. Create a new folder Website\Includes. make sure it is not listed. Create a file, say DSN.inc inside this folder.
Now, in your asp pages add this line:
<!-- #include FILE="../Includes/dsn.inc" -->
DSN.inc would contain the connection string (here is an example of Jet4.0 connection string)
set rs=Server.CreateObject("adodb.Recordset")
sDSN = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & Server.MapPath ("..\Databases\db.mdb") & "; "
In your ASP pages just re-use rs

How to delete SQL rows based on source file last modified date, using SSIS package?

I have created a functioning SSIS package which pulls rows from a flat file into a SQL table. I just need to be able to delete old rows in the table, once they are older than 10 days.
The only thing is, there is no date column and I'm wondering if there is a way to do this, using the DateLastModified property from the source file? I'm not sure if this can be done via a script task or something else?
Your advice would be appreciated. :-)
So I've tried to include the date of the source file by creating a FileDate variable, along with FilePath and SourceFolder variables. I've utilized the FileDate variable by adding a derived column, Date_Imported w/the expression, #[User::FileDate]. The FilePath variable is assigned the location, "d:\inputfiles*.txt", as indicated in the below code. The SourceFolder has been given the value, "D:\InputFiles\".
However, I'm receiving an "Exception has been thrown by the target of an invocation.
System.MissingMemberException: Public member 'GetFiles' on type 'FileSystemObject' not found."
The following is the content of my script task to delete records older than 10 days; please disregard any commented out lines, as I've been trying different things...I appreciate any guidance you can give:
Public Sub Main()
' Add your code here
Dim FilePath As String
'Dim SourceFolder As String
Dim iMaxAge = 10
Dim oFSO = CreateObject("Scripting.FileSystemObject")
Dim myConnection As SqlConnection
Dim myCommand As SqlCommand
myConnection = New SqlConnection("server = localhost; uid=sa; pwd=; database=StampsProj")
FilePath = "d:\inputfiles\*.txt"
'SourceFolder = "d:\inputfiles"
'SourceFolder.ReadOnly = True
'To delete records, older than 10 days from AddUpIn table
'For Each oFile In oFSO.GetFolder(SourceFolder).Files
For Each oFile In oFSO.GetFiles(Dts.Variables("User::SourceFolder"))
Dim FileDate As Date = oFile.DateLastModified
If DateDiff("d", oFile.DateLastModified, Now) > iMaxAge Then
'If DateDiff("d", oFile.FileDate, Now) > iMaxAge Then
myCommand = New SqlCommand("Delete from AddUpIn", myConnection)
End If
Next
End Sub
Sounds like you need to add either a datetime column onto your import table and set its value to the date you run the import. Or create a seperate FileImport table which logs the filename and an identifier, then add the identifier to the import table so you can identify the rows to delete.

Resources