Import 2 Excel Files via SSIS with different sheet names - sql-server

So as the title suggest, I need to do an import of 2 Excel (.xlsx) files from my local machine (c:\temp) into one SQL Server table. Each of the files only contains one sheet, but the sheet names will differ. The columnnames and no of columns on each file is identical.
If I select one specific excel file through SSIS via Excel Connection Manager, it extracts the data perfectly and inserts it into my destination SQL table.
The problem comes in when I add a ForEach Loop Container and want to loop through the c:\temp directory to read the 2 files. Somewhere I am missing a setting and keep getting various "connect to Excel" errors.
Please assist with the following:
I am unsure how to specify the Excel file path. Is the below correct? I used to select the exact file here when loading only 1 file:
Then it seems I need to create variables, so I did below:
Then I am not sure if I should add an expression to my ForEach loop and which mappings would be correct?
And lastly, I am not sure whether to put the filename or sheetname as variable below. I tried the filepath, but get the following error:
Please help as I am totally lost with this.
UPDATE
OK, I have now done the following:
Added a SheetName variable (which I think the Value is maybe incorrect). I am trying to tell it to only read the first sheet.
Then my Excel connection string looks like this:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=;Extended Properties="EXCEL 12.0 XML;HDR=NO";
My ForEach loop:
And my Excel source:
I get the following error:
[Book 2] Error: Opening a rowset for "Sheet1$" failed. Check that the object exists in the database.

It seems like your biggest issue is in regards to getting the sheetname which can vary, and the only way I know how to do this is with a script task.
So inside your foreach loop (store filepath to the Excel file) as variable, add a script task before you enter the data flow.
First of all start with knowing you connection string (I use this site for help: https://www.connectionstrings.com/excel/)
Set your read/write variable to [SheetName] and read to FilePath
Code:
var cstr = string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml; HDR = YES"";"
, Dts.Variables["FilePath"].ToString()); //Enter your connection string for OleDB conn
using (var conn = new System.Data.OleDb.OleDbConnection(cstr))
{
conn.Open();
var sheets = conn.GetOleDbSchemaTable(System.Data.OleDb.OleDbSchemaGuid.Tables, null);
//Since there is only 1 for sure.
Dts.Variables["SheetName"] = sheets.Rows[0]["TABLE_NAME"].ToString();
}
Now you have the SheetName in a variable (this will have the $ in the sheetname that you need as well), set up another variable called SQL and define it as "Select * from [" + SheetName + "]".
Now use the variable SQL in your DataFlow Source.

Related

Passing filename from SSIS script task

I'm attempting to create a SSIS package that loads a flat file into a SQL server table.
I've been able to piece the loading functionality together. I'm currently stuck on passing the filename if it's found from the script task back to a variable where I'd like to use it in the flat file connection string.
Public Sub Main()
'
Dim di As DirectoryInfo = New DirectoryInfo("\\winshare\iFile\Cors2\AAA\AAA Employee Incentive Source Data\")
Dim fi As FileInfo() = di.GetFiles("AAA Full PreReg Report*.csv")
If fi.Length > 0 Then
Dts.Variables("User::fileExists").Value = True
Dts.Variables("User::FileName").Value = fi.name
Else
Dts.Variables("User::fileExists").Value = False
End If
' Add your code here
'
Dts.TaskResult = ScriptResults.Success
End Sub
I'm seeking help with
Dts.Variables("User::FileName").Value = fi.name
Why won't this work?
Thanks
If you are looking to get the first file in the directory then you can use the following line of code:
Dts.Variables("User::FileName").Value = fi(0).name
But If you are looking to loop over files then i recommend using the Foreach loop container to loop over files and store each file name within a variable:
SSIS - How to loop through files in folder and get path+file names and finally execute stored Procedure with parameter as Path + Filename
FAQ - How to loop through files in a specified folder, load one by one and move to archive folder using SSIS

SSIS: import MAX(filename) from folder

I need to pick one .csv file from \\\Share\Folder\ with max filename for further import to SQL. File name is alphanumerical, e.g. ABC_DE_FGHIJKL_MNO_PQRST_U-1234567.csv, where numerical part will vary, but I need the only max one each time the package runs.
Constraints: no write access on that SQL server, I use ##Temp table for import and this is the least desirable method for filename processing (no for each loops on this server).
Ideally it will be function/expr-based variable (combined with script task if needed) to pass into connection manager. Any ideas much appreciated.
Use a Script Task
Add a variable of type String User::CsvFile
Add a script task to your project and add your created variable as a ReadWriteVariable
In Your Script task write the following code (VB.NET):
You have to Import System.Linq Library
Public Sub Main()
Dim strDirectory As String = "C:\New Folder" ' Enter =the directory
Dim strFile As String = String.Empty
strFile = IO.Directory.GetFiles(strDirectory, "*.csv", IO.SearchOption.TopDirectoryOnly).OrderBy(Function(x) x.Length).Last
Dts.Variables.Item("CsvFile").Value = strFile
Dts.TaskResult = ScriptResults.Success
End Sub
Then use this variable from Flat File Source

SSIS Excel Import - Worksheet variable OR wildcard?

I have a SSIS data import package that uses a source Excel spreadsheet and then imports data into a SQL Server database table. I have been unsuccessful in automating this process because the Excel file's worksheet name is changed every day. So, I have had to manually change the worksheet name before running the import each day. As a caveat, there will never be any other worksheets.
Can I make a variable for the worksheet name?
Can I use a wildcard character rather than the worksheet name?
Would I be better off creating an Excel macro or similar to change the worksheet name before launching the import job?
I use the follow script task (C#):
System.Data.OleDb.OleDbConnection objConn;
DataTable dt;
string connStr = ""; //Use the same connection string that you have in your package
objConn = new System.Data.OleDb.OleDbConnection(ConnStr);
objConn.Open();
dt = objConn.GetOleDbSchemaTable(System.Data.OleDb.OleDbShemaGuid.Tables,null);
objConn.Close();
foreach(DataRow r in dt.Rows)
{
//for some reason there is always a duplicate sheet with underscore.
string t = r["TABLE_NAME"].ToString();
//Note if more than one sheet exist this will only capture the last one
if(t.Substring(t.Length-1)!="_")
{
Dts.Variables["YourVariable"].Value = t;
}
}
And then in SSIS, I add another variable to build my SQL.
new variable "Select * from [" + "Your Variable" + "]"
Finally set your datasource to that SQL variable in Excel Source.
This works perfectly for me with the same scenario, in case it helps you or someone else:
Required package level string variables 2:
varDirectoryList - You will use this inside SSIS for each loop variable mapping
varWorkSheet - This will hold your changing worksheet name. Since you only have 1, it's perfect.
Set up:
a. Add SSIS For Each Loop
b. Excel Connection Manager (connect to first workbook as you test, then at the end you will go to properties and add inside expression "Excel File Path" your varDirectoryList. Set DelayValidation True as well as your Excel Source task. *This will help it go through each workbook in your folder)
c. Inside your For Each Loop add a Scrip Task C#, title it "Get changing worksheet
name into variable" or your preference.
Data Flow Task with your Excel Source to SQL Table Destination.
In your Scrip Task add this code:
using System.Data.OleDb;
public void Main()
{
// store file name passed into Script Task
string WorkbookFileName = Dts.Variables["User::varDirectoryList"].Value.ToString();
// setup connection string
string connStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"EXCEL 12.0;HDR=Yes;IMEX=1;\"", WorkbookFileName);
// setup connection to Workbook
using (var conn = new OleDbConnection(connStr))
{
try
{
// connect to Workbook
conn.Open();
// get Workbook schema
using (DataTable worksheets = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null))
{
// in ReadWrite variable passed into Script Task, store third column in the first
// row of the DataTable which contains the name of the first Worksheet
Dts.Variables["User::varWorkSheet"].Value = worksheets.Rows[0][2].ToString();
//Uncomment to view first worksheet name of excel file. For testing purposes.
MessageBox.Show(Dts.Variables["User::varWorkSheet"].Value.ToString());
}
}
catch (Exception)
{
throw;
}
}
}
After you have this set up and run, you will get a message box displaying the changing worksheet names per workbooks.
If you are using Excel Source SQL Command you will need a 3rd string
variable like: varExcelSQL and inside that an expression like: SELECT
columns FROM ['varWorkSheet$'] which will dynamically change to match
each workbook. You may or may not need the single quotes, change as
needed in varExcelSQL.
If you are not using Excel Source SQL and just loading straight from
the Table; go into Excel Source Properties --> AccessMode -->
OpenRowSet from Variable --> select varWorkSheet.
That should take care of it, as long as the column structures remain the same.
If you happen to get files where it has multi data types in one column; you can use IMEX=1 inside your connection string which forces the datatypes to DT_WSTR's on import.
Hope this helps :-)
If you are using SSIS to import the sheet you could use a script task to find the name of the sheet and then change the name or whatever else you needed to do in order to make it fit the rest of your import. Here is an example of finding the sheet I found here
Dim excel As New Microsoft.Office.Interop. Excel.ApplicationClass
Dim wBook As Microsoft.Office.Interop. Excel.Workbook
Dim wSheet As Microsoft.Office.Interop. Excel.Worksheet
wBook = excel.Workbooks.Open
wSheet = wBook.ActiveSheet()
For Each wSheet In wBook.Sheets
MsgBox(wSheet.Name)
Next
On the MsgBox line is where you could change the name or report it back for another process

EXCEL SQL SELECT won't recognize field names in ThisWorkBook

I'm a self taught Excel VBA and SQL user. I'm testing out some simple queries before I add complexity. I must be missing something blindingly obvious here...
I am using an ADO connection to run a SQL SELECT statement on a table in the activeworkbook (ThisWorkBook). The Excel Table is named "tbl_QDB" and is on worksheet "MyQDB". The table starts on cell A1, so there are no blank or populated cells above the Table HeaderRowRange.
I have set up an ADO connection to ThisWorkBook and this is working fine. Here's the code:
Sub ConnectionOpen2()
'### UNDER DEVELOPMENT
Dim sconnect As String
Const adUseClient = 3
Const adUseServer = 2
Const adLockOptimistic = 3
Const adOpenKeyset = 1
Const adOpenDynamic = 2
'used to connect to this workbook for SQL runs
On Error GoTo err_OpenConnection2
Set cn2 = CreateObject("ADODB.Connection")
Set rec2 = CreateObject("ADODB.Recordset")
rec2.CursorLocation = adUseClient
rec2.CursorType = adOpenStatic
rec2.LockType = adLockOptimistic
datasource = ThisWorkbook.FullName
sconnect = "Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=" & datasource & ";" & _
"Extended Properties=""Excel 12.0;HDR=YES;ReadOnly=False;Imex=0"";"
cn2.Open sconnect
'etc, etc...
End Sub
I can run this simplest basic SELECT query:
SQLSTR="SELECT * FROM [MYQDB$]"
rec2.open SQLSTR, cn2
This works and produces 10 records i.e. rec2.recordcount=10.
However, if I try this, it errors:
SQLSTR="SELECT QID_1 FROM [MYQDB$]"
QID_1 is a valid field in the table on worksheet "MyQDB".
It doesn't change the error if I enclose QID_1 in () or [] or ``
I can even replace the field name with a made up field e.g. DonaldDuck and I get the same error.
Why would the SELECT statement work if I use "*" but not if I use any of the field names in the table? This seems so basic that I feel I must have missed a simple but key point.
I really will appreciate if someone can point out the mistake!
The SQL should work - if the field exists. Execute the Select * and dump the field list:
For i = 0 To rec2.Fields.Count - 1
Debug.Print rec2.Fields(i).Name
Next i
Thank you all for your comments.
That suggestion #FunThomas was an eye opener! The results were F1, F2, F3 etc, so the field names (or column names if you prefer) were not being recognised.
This would explain why, after days of trying to join this table with another in a closed, external workbook, it was not working. SQL error messages can be quite obtuse and were not saying it didn't recognise the field name.
I have now fixed that issue. Here's what I can tell / warn others:
I started this table with rows above the header. In 2 of those cells
above I recorded the last connection time and status to another
workbook table. I realised before that these extra rows, with data
populated in ANY cell above the headers, were causing problems with
SQL. Despite having my data in an Excel Table, the SQL "engine" for
Excel looks at the sheet, i.e. [MYQDB$] where the data is stored
(although I am aware that you can specify a sheet and range, but
cannot use the actual table name as the range).
It is ok to have blank rows above the table headerrowrange. So, I
deleted the cells containing the data above the table
headerrowrange. Instead, I placed a Text Box and used a formula to
look at another sheet where the last connection time and status were
now stored to supply the text for the text box.
I can now see that even this Text Box, which occupies no cell, causes a problem for Excel SQL.
Before posting my question here, I made a copy of the workbook and removed the text box and the rows above the table headerrowrange. I still got errors. I still got F1, F2, F3 etc as field names (per #FunThomas's suggestion).
Only after deleting these rows and the text box and then resizing the table (actually, the same range as before) did the Excel SQL recognise the proper field names. I was then even able (just for curiosity) to insert a blank row above the table headerrowrange, and the SQL still worked.
It seems to me that Excel retained in memory the old table definition and only by removing all data above the table headerrowrange and then resizing the table did it refresh that. Perhaps I should be less lazy in future and call the sheetname and range (table address) in the sql: maybe that would ignore data in cells above the headerrowrange?
#PanagiotisKanavos: I was originally trying to compare two tables (actual Excel Tables, not just ranges, hence they have Field Names), one in ThisWorkBook and another in a closed Excel workbook. SQL is the best way to do this. Having failed to get a left join to work between these tables (and this Question might now have revealed why that wouldn't work!) I decided to bring the data from the external workbook into ThisWorkBook and compare there. Then I was going to find the differences, store in a recordset (hence SQL) and then INSERT INTO the external workbook.
Thanks for your help guys!

Using Excel as an ODBC database

I'd like to know, how to create a database table in Excel, so that it may be used with ODBC
I want to use ODBC, and I have two options, either MS Access or Excel,
As you probably know, in order to indicate some MS Access file or Excel file as an ODBC source, you need to follow:
Administrative Tools -> Data Sources (ODBC) -> Choose User DSN -> Choose either 'Excel Files' or 'MS Access Database' from the list -> Press 'Configure' -> finally choose the file (MS Access or Excel) as ODBC source
Well, it works fine with MS Access, I can connect to the file and see all tables that I've created inside
But when it comes to Excel, although I can connect to the file, I can't see the table that I've created inside
I just used 'Table' in 'Insert' tab, added some headers as column names, and gave the table
a meaningful name. Is that the way to do it?
There are several ways you can reference "table" data in an Excel workbook:
An entire worksheet.
A named range of cells on a worksheet.
An unnamed range of cells on a worksheet.
They are explained in detail in the "Select Excel Data with Code" section of the Microsoft Knowledge Base article 257819.
The most straightforward way is to keep the data on a separate sheet, put column names in the first row (starting in cell A1), and then have the actual data start in row 2, like this
To test, I created a User DSN named "odbcFromExcel" that pointed to that workbook...
...and then ran the following VBScript to test the connection:
Option Explicit
Dim con, rst, rowCount
Set con = CreateObject("ADODB.Connection")
con.Open "DSN=odbcFromExcel;"
Set rst = CreateObject("ADODB.Recordset")
rst.Open "SELECT * FROM [Sheet1$]", con
rowCount = 0
Do While Not rst.EOF
rowCount = rowCount + 1
If rowCount = 1 Then
Wscript.Echo "Data row 1, rst(""LastName"").Value=""" & rst("LastName").Value & """"
End If
rst.MoveNext
Loop
Wscript.Echo rowCount & " data rows found."
rst.Close
Set rst = Nothing
con.Close
Set con = Nothing
The results were
C:\Users\Gord\Documents\__tmp>cscript /nologo excelTest.vbs
Data row 1, rst("LastName").Value="Thompson"
10 data rows found.
I hope that helps your Excel connection issue.
As a final comment I have to say that if you are doing something that takes "several seconds" to do in Excel but "takes around 20-25 min" to do in Access then I strongly suspect that you are using Access in a very inefficient way, but that's a topic for another question (if you care to pursue it).
EDIT
If you want to INSERT data into an Excel workbook then that is possible, but be aware that the default setting for an Excel ODBC connection is "Read Only" so you have to click the "Options>>" button and clear that checkbox:
Once that's done, the following code...
Option Explicit
Dim con
Set con = CreateObject("ADODB.Connection")
con.Open "DSN=odbcFromExcel;"
con.Execute "INSERT INTO [Sheet1$] (ID, LastName, FirstName) VALUES (11, 'Dumpty', 'Humpty')"
con.Close
Set con = Nothing
Wscript.Echo "Done."
...will indeed append a new row in the Excel sheet with the data provided.
However, that still doesn't address the problem of no "Tables" being available for selection when you point your "sniffer" app at an Excel ODBC DSN.
One thing you could try would be to create an Excel sheet with column headings in row 1, then select those entire columns and create an Excel "Defined Name". Then, see if your "sniffer" app recognizes that as a "table" name that you can select.
FWIW, I defined the name myTable as =Sheet1!$A:$C in my Excel workbook, and then my original code sort of worked when I used SELECT * FROM [myTable]:
C:\Users\Gord\Documents\__tmp>cscript /nologo excelTest.vbs
Data row 1, rst("LastName").Value="Thompson"
1048576 data rows found.
As you can see, it retrieved the first "record" correctly, but then it didn't recognize the end of the valid data and continued to read the ~1 million rows in the sheet.
I doubt very much that I will be putting any more effort into this because I agree with the other comments that using Excel as an "ODBC database" is really not a very good idea.
I strongly suggest that you try to find out why your earlier attempts to use Access were so unsatisfactory. As I said before, it sounds to me like something was doing a really bad job at interacting with Access.
I had a similar problem with some data recently. The way I managed to get around it was to select the data as a range A1:XY12345, then use the Define Name tool to name the range. When you connect to the Excel workbook via ODBC, this named range will appear as a "table," while ranges that you actually defined (per Excel) as a table, do not.
You just need to select as many as required columns from first row of your excel file and then give a name to it on the edit box left to the formula bar. Of course you give a name to each column of the file too!

Resources