I have a SSIS data import package that uses a source Excel spreadsheet and then imports data into a SQL Server database table. I have been unsuccessful in automating this process because the Excel file's worksheet name is changed every day. So, I have had to manually change the worksheet name before running the import each day. As a caveat, there will never be any other worksheets.
Can I make a variable for the worksheet name?
Can I use a wildcard character rather than the worksheet name?
Would I be better off creating an Excel macro or similar to change the worksheet name before launching the import job?
I use the follow script task (C#):
System.Data.OleDb.OleDbConnection objConn;
DataTable dt;
string connStr = ""; //Use the same connection string that you have in your package
objConn = new System.Data.OleDb.OleDbConnection(ConnStr);
objConn.Open();
dt = objConn.GetOleDbSchemaTable(System.Data.OleDb.OleDbShemaGuid.Tables,null);
objConn.Close();
foreach(DataRow r in dt.Rows)
{
//for some reason there is always a duplicate sheet with underscore.
string t = r["TABLE_NAME"].ToString();
//Note if more than one sheet exist this will only capture the last one
if(t.Substring(t.Length-1)!="_")
{
Dts.Variables["YourVariable"].Value = t;
}
}
And then in SSIS, I add another variable to build my SQL.
new variable "Select * from [" + "Your Variable" + "]"
Finally set your datasource to that SQL variable in Excel Source.
This works perfectly for me with the same scenario, in case it helps you or someone else:
Required package level string variables 2:
varDirectoryList - You will use this inside SSIS for each loop variable mapping
varWorkSheet - This will hold your changing worksheet name. Since you only have 1, it's perfect.
Set up:
a. Add SSIS For Each Loop
b. Excel Connection Manager (connect to first workbook as you test, then at the end you will go to properties and add inside expression "Excel File Path" your varDirectoryList. Set DelayValidation True as well as your Excel Source task. *This will help it go through each workbook in your folder)
c. Inside your For Each Loop add a Scrip Task C#, title it "Get changing worksheet
name into variable" or your preference.
Data Flow Task with your Excel Source to SQL Table Destination.
In your Scrip Task add this code:
using System.Data.OleDb;
public void Main()
{
// store file name passed into Script Task
string WorkbookFileName = Dts.Variables["User::varDirectoryList"].Value.ToString();
// setup connection string
string connStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"EXCEL 12.0;HDR=Yes;IMEX=1;\"", WorkbookFileName);
// setup connection to Workbook
using (var conn = new OleDbConnection(connStr))
{
try
{
// connect to Workbook
conn.Open();
// get Workbook schema
using (DataTable worksheets = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null))
{
// in ReadWrite variable passed into Script Task, store third column in the first
// row of the DataTable which contains the name of the first Worksheet
Dts.Variables["User::varWorkSheet"].Value = worksheets.Rows[0][2].ToString();
//Uncomment to view first worksheet name of excel file. For testing purposes.
MessageBox.Show(Dts.Variables["User::varWorkSheet"].Value.ToString());
}
}
catch (Exception)
{
throw;
}
}
}
After you have this set up and run, you will get a message box displaying the changing worksheet names per workbooks.
If you are using Excel Source SQL Command you will need a 3rd string
variable like: varExcelSQL and inside that an expression like: SELECT
columns FROM ['varWorkSheet$'] which will dynamically change to match
each workbook. You may or may not need the single quotes, change as
needed in varExcelSQL.
If you are not using Excel Source SQL and just loading straight from
the Table; go into Excel Source Properties --> AccessMode -->
OpenRowSet from Variable --> select varWorkSheet.
That should take care of it, as long as the column structures remain the same.
If you happen to get files where it has multi data types in one column; you can use IMEX=1 inside your connection string which forces the datatypes to DT_WSTR's on import.
Hope this helps :-)
If you are using SSIS to import the sheet you could use a script task to find the name of the sheet and then change the name or whatever else you needed to do in order to make it fit the rest of your import. Here is an example of finding the sheet I found here
Dim excel As New Microsoft.Office.Interop. Excel.ApplicationClass
Dim wBook As Microsoft.Office.Interop. Excel.Workbook
Dim wSheet As Microsoft.Office.Interop. Excel.Worksheet
wBook = excel.Workbooks.Open
wSheet = wBook.ActiveSheet()
For Each wSheet In wBook.Sheets
MsgBox(wSheet.Name)
Next
On the MsgBox line is where you could change the name or report it back for another process
Related
So as the title suggest, I need to do an import of 2 Excel (.xlsx) files from my local machine (c:\temp) into one SQL Server table. Each of the files only contains one sheet, but the sheet names will differ. The columnnames and no of columns on each file is identical.
If I select one specific excel file through SSIS via Excel Connection Manager, it extracts the data perfectly and inserts it into my destination SQL table.
The problem comes in when I add a ForEach Loop Container and want to loop through the c:\temp directory to read the 2 files. Somewhere I am missing a setting and keep getting various "connect to Excel" errors.
Please assist with the following:
I am unsure how to specify the Excel file path. Is the below correct? I used to select the exact file here when loading only 1 file:
Then it seems I need to create variables, so I did below:
Then I am not sure if I should add an expression to my ForEach loop and which mappings would be correct?
And lastly, I am not sure whether to put the filename or sheetname as variable below. I tried the filepath, but get the following error:
Please help as I am totally lost with this.
UPDATE
OK, I have now done the following:
Added a SheetName variable (which I think the Value is maybe incorrect). I am trying to tell it to only read the first sheet.
Then my Excel connection string looks like this:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=;Extended Properties="EXCEL 12.0 XML;HDR=NO";
My ForEach loop:
And my Excel source:
I get the following error:
[Book 2] Error: Opening a rowset for "Sheet1$" failed. Check that the object exists in the database.
It seems like your biggest issue is in regards to getting the sheetname which can vary, and the only way I know how to do this is with a script task.
So inside your foreach loop (store filepath to the Excel file) as variable, add a script task before you enter the data flow.
First of all start with knowing you connection string (I use this site for help: https://www.connectionstrings.com/excel/)
Set your read/write variable to [SheetName] and read to FilePath
Code:
var cstr = string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml; HDR = YES"";"
, Dts.Variables["FilePath"].ToString()); //Enter your connection string for OleDB conn
using (var conn = new System.Data.OleDb.OleDbConnection(cstr))
{
conn.Open();
var sheets = conn.GetOleDbSchemaTable(System.Data.OleDb.OleDbSchemaGuid.Tables, null);
//Since there is only 1 for sure.
Dts.Variables["SheetName"] = sheets.Rows[0]["TABLE_NAME"].ToString();
}
Now you have the SheetName in a variable (this will have the $ in the sheetname that you need as well), set up another variable called SQL and define it as "Select * from [" + SheetName + "]".
Now use the variable SQL in your DataFlow Source.
I need to pick one .csv file from \\\Share\Folder\ with max filename for further import to SQL. File name is alphanumerical, e.g. ABC_DE_FGHIJKL_MNO_PQRST_U-1234567.csv, where numerical part will vary, but I need the only max one each time the package runs.
Constraints: no write access on that SQL server, I use ##Temp table for import and this is the least desirable method for filename processing (no for each loops on this server).
Ideally it will be function/expr-based variable (combined with script task if needed) to pass into connection manager. Any ideas much appreciated.
Use a Script Task
Add a variable of type String User::CsvFile
Add a script task to your project and add your created variable as a ReadWriteVariable
In Your Script task write the following code (VB.NET):
You have to Import System.Linq Library
Public Sub Main()
Dim strDirectory As String = "C:\New Folder" ' Enter =the directory
Dim strFile As String = String.Empty
strFile = IO.Directory.GetFiles(strDirectory, "*.csv", IO.SearchOption.TopDirectoryOnly).OrderBy(Function(x) x.Length).Last
Dts.Variables.Item("CsvFile").Value = strFile
Dts.TaskResult = ScriptResults.Success
End Sub
Then use this variable from Flat File Source
I am working with SSIS and I need to load multiple files with the following (Yellos) format to SQL using SSIS
The problem as you can see is that the files has an horrible format only process / consume records if the column A is populated (e.g: ignoring rows# 14 - X ) and I need to insert the value in D1 into the Date column.
any suggestion?
Regards!
Lets divide this problem into 3 Sub problems:
Get the date value from D1
Start Reading from Row number 4
Ignore all Rows where Column1 is NULL
Solution
1. Get the date value from D1
Create 2 SSIS variables, #[User::FilePath] (of type string) that contains the excel file path, #[User::FileDate] (of type string) that we will use it to store the date value
Add a script Task, choose the script language as Visual Basic
Select #[User::FilePath] as a ReadOnly variable and #[User::FileDate] as a ReadWrite variable
Open the Script Editor and use the following code to retrieve the Date Value and store it into #[User::FileDate]
This will search for the sheet named Refunds and extract the date value from it and store this value into #[User::FileDate]
m_strExcelPath = Dts.Variables.Item("FilePath").Value.ToString
Dim strSheetname As String = String.Empty
Dim strDate as String = String.Empty
m_strExcelConnectionString = Me.BuildConnectionString()
Try
Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)
If OleDBCon.State <> ConnectionState.Open Then
OleDBCon.Open()
End If
'Get all WorkSheets
m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
New Object() {Nothing, Nothing, Nothing, "TABLE"})
'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones
For Each schRow As DataRow In m_dtschemaTable.Rows
strSheetname = schRow("TABLE_NAME").ToString
If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then
If Not strSheetname.Tolower.Contains("refunds") Then Continue For
Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "A1:D1]", OleDBCon)
Dim dtTable As New DataTable("Table1")
cmd.CommandType = CommandType.Text
Using daGetDataFromSheet As New OleDbDataAdapter(cmd)
daGetDataFromSheet.Fill(dtTable)
'Get Value from column 4 (3 because it is a zero-based index
strDate = dtTable.Rows(0).Item(3).ToString
End Using
End Using
'when the first correct sheet is found there is no need to check others
Exit For
End If
Next
OleDBCon.Close()
End Using
Catch ex As Exception
Throw New Exception(ex.Message, ex)
End Tr
Dts.Variables.Item("FileDate").Value = strDate
Dts.TaskResult = ScriptResults.Success
End Sub
In the DataFlow Task add a Derived Column Transformation, add a derived column with the following expression
#[User::FileDate]
2. Start Reading from Row Number 4
As we assumed that the Excel File Path is stored in #[User::FilePath]
First open the Excel Connection Manager and uncheck the box First row has column names
In the DataFlow Task, double click on the excel source
Set the source to SQL Command
Use the following command: SELECT * FROM [Refunds$A4:D] , so it will start reading from the row number 4
Columns names will be as the following F1 ... F4 , in the excel source you can go to the Columns Tab and give alias to the columns names, so in the data flow task they will be showed with their aliases
3. Ignore all Rows Where Column1 is NULL
Add a conditional split after the Excel Source
Split the Flow based on the following expression
ISNULL([F1]) == False
If you didn't give an alias to F1 otherwise use the alias
Finally, remember that you must add a derived column (as we said in the first sub-problem) that contains the date value
Over the last few days, I was asked to move a company program over from an Access back-end, to SQL Server.
There are 2 copies of the program, the live data version, on the server, and the local version on my PCs C: Drive, to ensure if I make a mistake, it doesn't affect the live data.
So, I managed to migrate the Access database, tables and data over to SQL Server 2008, and the local version of the program now works.
The easiest way, or so I'm informed, to now do the same to the live version of the program, is to write an imports program, which wipes all of the data from each table in the SQL Server database, and then copies over the data from the live Access database. However, I've never done this before, so I'm not really even sure where to begin.
Could anybody point me in the right direction on how to begin or do this, so that I only have to change the connection path in the program, rather than go through the whole process again?
PS, I work in vb.net, so that's the language I would need any responses in!
Thanks.
Usually one uses the SQL Server Import and Export Wizard for this.
It's a separate tool that is installed with SQL Server Management Studio (SSMS).
ANSWER
Step 1;
I added a new path to the ini file for the database to read. This connected to the live database. Once this connection is open in the project, proceed to step 2.
Step 2;
Create a new class, where the imports and exports will happen.
Step 3;
Put a button, or some sort of control in the program to initiate the import/export. For example, I had a button which, when clicked, asked the user to confirm that they wanted to import a new database and overwrite the existing one. If yes, call the function which does this, in the newly made imports class.
Step 4;
Now that you know how to get this set up, the code would be something like
Public Function importdatabase(/connections go in here/)
Declare transaction
Create sql variable
Try
Begin the transaction here
sql to delete the data from one table
sql to select all data from database that is being imported
For loop to iterate over each record in the database table
Declare a variable for each field in the database
variable1 = ("fieldname1")
variable2 = ("fieldname2")
sql statement to insert the new values
call to the function which runs the sql query
Next
commit transaction
Catch ex As Exception
Throw
End Try
Step 5; Repeat the delete/insert process for each database table
Below this, I have other functions.
One function created a new datatable, this is referenced as
For each dr as datarow in /functionname(parameters).Rows
Next one is to execute the sql statement (not required, any command to execute it will do)
Next one is used for parameterising my SQL query
The rest are to replace null values in the database with empty strings, set dates, etc
You can use the following class to import table(s) in access to sql server.
You need:
- The connection string of the source (including access file name) and the target one.
- The Source Table ,target tatble (if null it is the same as the source table)
Class ImportHelper
'modify connectionstring as needed
Public Property SourceConnectionString() As String
Get
Return m_SourceConnectionString
End Get
Set
m_SourceConnectionString = Value
End Set
End Property
Private m_SourceConnectionString As String
Public Property DestinationConnectionString() As String
Get
Return m_DestinationConnectionString
End Get
Set
m_DestinationConnectionString = Value
End Set
End Property
Private m_DestinationConnectionString As String
Public Sub New(sourceConnectionString__1 As String, destinationConnectionString__2 As String)
SourceConnectionString = sourceConnectionString__1
DestinationConnectionString = destinationConnectionString__2
End Sub
Public Sub Import(sourceTable As String, Optional targetTable As String = Nothing)
Using sourceConnection = New OleDbConnection(SourceConnectionString)
If String.IsNullOrEmpty(targetTable) Then
targetTable = sourceTable
End If
sourceConnection.Open()
' Perform an initial count on the destination table.
Dim commandRowCount = New OleDbCommand(Convert.ToString("SELECT COUNT(*) FROM ") & sourceTable, sourceConnection)
Dim countStart As Long = Convert.ToInt32(commandRowCount.ExecuteScalar())
Console.WriteLine("Source Table [{0}] has {1} rows", sourceTable, countStart)
' Get data from the source table
Dim commandSourceData = New OleDbCommand(Convert.ToString("SELECT * FROM ") & sourceTable, sourceConnection)
Dim reader = commandSourceData.ExecuteReader()
'---------------
Using destinationConnection As New SqlConnection(DestinationConnectionString)
destinationConnection.Open()
Using bulkCopy As New SqlBulkCopy(destinationConnection)
bulkCopy.DestinationTableName = targetTable
Try
' Write from the source to the destination.
bulkCopy.WriteToServer(reader)
Console.WriteLine(Convert.ToString("Sucess Importing ") & sourceTable)
Catch ex As Exception
Console.WriteLine(ex.Message)
Finally
reader.Close()
End Try
'using
End Using
'using
End Using
End Using
'using
End Sub
End Class
How to use:
Private Sub Test()
'modify connectionstring as needed
'Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\mydatabase.mdb;User Id=admin;Password=; //access 97..2000
Dim SourceConnectionString As String = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\temp\database1.accdb;Persist Security Info=False;"
Dim DestinationConnectionString As String = "Data Source=xxxx;Initial Catalog=test;user=xxx;password=xxx;"
New ImportHelper(SourceConnectionString, DestinationConnectionString).Import("table1", "test1")
End Sub
I'd like to know, how to create a database table in Excel, so that it may be used with ODBC
I want to use ODBC, and I have two options, either MS Access or Excel,
As you probably know, in order to indicate some MS Access file or Excel file as an ODBC source, you need to follow:
Administrative Tools -> Data Sources (ODBC) -> Choose User DSN -> Choose either 'Excel Files' or 'MS Access Database' from the list -> Press 'Configure' -> finally choose the file (MS Access or Excel) as ODBC source
Well, it works fine with MS Access, I can connect to the file and see all tables that I've created inside
But when it comes to Excel, although I can connect to the file, I can't see the table that I've created inside
I just used 'Table' in 'Insert' tab, added some headers as column names, and gave the table
a meaningful name. Is that the way to do it?
There are several ways you can reference "table" data in an Excel workbook:
An entire worksheet.
A named range of cells on a worksheet.
An unnamed range of cells on a worksheet.
They are explained in detail in the "Select Excel Data with Code" section of the Microsoft Knowledge Base article 257819.
The most straightforward way is to keep the data on a separate sheet, put column names in the first row (starting in cell A1), and then have the actual data start in row 2, like this
To test, I created a User DSN named "odbcFromExcel" that pointed to that workbook...
...and then ran the following VBScript to test the connection:
Option Explicit
Dim con, rst, rowCount
Set con = CreateObject("ADODB.Connection")
con.Open "DSN=odbcFromExcel;"
Set rst = CreateObject("ADODB.Recordset")
rst.Open "SELECT * FROM [Sheet1$]", con
rowCount = 0
Do While Not rst.EOF
rowCount = rowCount + 1
If rowCount = 1 Then
Wscript.Echo "Data row 1, rst(""LastName"").Value=""" & rst("LastName").Value & """"
End If
rst.MoveNext
Loop
Wscript.Echo rowCount & " data rows found."
rst.Close
Set rst = Nothing
con.Close
Set con = Nothing
The results were
C:\Users\Gord\Documents\__tmp>cscript /nologo excelTest.vbs
Data row 1, rst("LastName").Value="Thompson"
10 data rows found.
I hope that helps your Excel connection issue.
As a final comment I have to say that if you are doing something that takes "several seconds" to do in Excel but "takes around 20-25 min" to do in Access then I strongly suspect that you are using Access in a very inefficient way, but that's a topic for another question (if you care to pursue it).
EDIT
If you want to INSERT data into an Excel workbook then that is possible, but be aware that the default setting for an Excel ODBC connection is "Read Only" so you have to click the "Options>>" button and clear that checkbox:
Once that's done, the following code...
Option Explicit
Dim con
Set con = CreateObject("ADODB.Connection")
con.Open "DSN=odbcFromExcel;"
con.Execute "INSERT INTO [Sheet1$] (ID, LastName, FirstName) VALUES (11, 'Dumpty', 'Humpty')"
con.Close
Set con = Nothing
Wscript.Echo "Done."
...will indeed append a new row in the Excel sheet with the data provided.
However, that still doesn't address the problem of no "Tables" being available for selection when you point your "sniffer" app at an Excel ODBC DSN.
One thing you could try would be to create an Excel sheet with column headings in row 1, then select those entire columns and create an Excel "Defined Name". Then, see if your "sniffer" app recognizes that as a "table" name that you can select.
FWIW, I defined the name myTable as =Sheet1!$A:$C in my Excel workbook, and then my original code sort of worked when I used SELECT * FROM [myTable]:
C:\Users\Gord\Documents\__tmp>cscript /nologo excelTest.vbs
Data row 1, rst("LastName").Value="Thompson"
1048576 data rows found.
As you can see, it retrieved the first "record" correctly, but then it didn't recognize the end of the valid data and continued to read the ~1 million rows in the sheet.
I doubt very much that I will be putting any more effort into this because I agree with the other comments that using Excel as an "ODBC database" is really not a very good idea.
I strongly suggest that you try to find out why your earlier attempts to use Access were so unsatisfactory. As I said before, it sounds to me like something was doing a really bad job at interacting with Access.
I had a similar problem with some data recently. The way I managed to get around it was to select the data as a range A1:XY12345, then use the Define Name tool to name the range. When you connect to the Excel workbook via ODBC, this named range will appear as a "table," while ranges that you actually defined (per Excel) as a table, do not.
You just need to select as many as required columns from first row of your excel file and then give a name to it on the edit box left to the formula bar. Of course you give a name to each column of the file too!