Excel to SQL (SSIS) - Importing more then 1 file, ever file has more then 1 sheet and the data from excel starts from the 3rd row - sql-server

Excel to SQL (SSIS) - Importing more then 1 file, ever file has more then 1 sheet and the data from excel starts from the 3rd row.
How would you build this the best way?
I know how to do each 1 separate but together I got into a pickle.
Please help me as I haven't found Videos or sites regarding this.
Just to clarify -
The tables (in excel) have the same design (each in different sheet).
Some excel files have 4 sheets some have only 3.
Many thanks,
Eyal

Assuming that all of the Excel files to be imported are located in the same folder, you will first create a For-Each loop in your control flow. Here you will create a user variable that will be assigned the full path and file name of the Excel file being read (you'll need to define the .xls or .xlsx extension in the loop in order to limit it to reading only Excel files). The following link shows how to set up the first part.
How to read data from multiple Excel files with SQL Server Integration Services
Within this loop you will then create a another For-Each loop that will loop through all of the Worksheets in that current Excel file being read. Apply the following link to perform that task of reading the rows and columns from each worksheet into the database table.
Use SSIS to import all of the worksheets from an Excel file
The outer loop will pick up the Excel file and the inner loop will read each worksheet, regardless of the number. They key is that the format of each worksheet must be the same. Also, using the Excel data flow task, you can define from which line of each worksheet to begin reading. The process will continue until all of the Excel files have been read.
For good tracking and auditing purposes, it is a good idea to include counters in the automated process to track the number of files and worksheets for each that were read. I also like to first import all of the records into staging tables where any issues and cleaning can be performed for efficiently using SQL before populating the results to the final production tables.
Hope this all helps.

Related

With SSIS, how do you export SQL results to multiple CSV files?

In my SSIS package, I have an Execute SQL Task that is supposed to return up to one hundred million (100,000,000) rows.
I would like to export these results to multiple CSV files, where each file has a maximum of 500,000 rows. So if the SQL task generates 100,000,000 results, I would like to produce 200 csv files with 500,000 records in each.
What are the best SSIS tasks that can automatically partition the results into many exported CSV files?
I am currently developing a script task but find that it's not very performant. I am a bit new to SSIS so I am not familiar with all the different tasks available, and I'm wondering if maybe there's another one that can do it much more efficiently.
Any recommendations?
Static approach
First add a dataflow task.
In the dataflow task add the following:
A source: in the screenshot ADO NET Source. That contains the query to retrieve the data
A conditional split: Every condtion you add will result in a blue output arrow. You need to connect every arrow to a destination
Excel destination or flat file destiation. Depending if you want Excel files or csv files. For CSV files you'll need to setup a file connection.
In the conditional split you can add multiple conditions to split out your data and have a default output.
Flat file connection manager:
Dynamic approach
Use Execute SQL Task to retrieve the variables to start a for loop. (BatchSize, Start, End)
Add a for / foreach
Add a dataflow task in the loop, pass in the parameters from the loop.
(You can pass parameters/expressions to sub process in the dataflow using the expressions property. )
Fetch the data with a source in a dataflow task based on the parameters from the for loop.
Write to a destination (Excel/CSV) with a dynamic name based from the parameters of the loop.

Finding the column names from source assistant in SSIS

I am creating a SSIS package in which i have to move data from Excel to a table in SQL server. Excel file is like Source Assistant in data flow task.
Number columns in Excel file won't change but column names will change. So i have to find all the columns names in Excel file before inserting data.
Could you please help me on this?
Solution overview
Exclude column names in first row in excel connection, use sql command as data access mode
Alias column names in output column as matching your destination
Add a script task before the data flow task that import the data
You have to use the script task to open the excel file and get the Worksheet name and the header row
Build the Query and store it in a variable
in the second Data Flow task you have to use the query stored above as source (Note that you have to set Delay Validation property to true)
Detailed Solution
You can follow my answer at Importing excel files having variable headers it is solving a very similar case.

Excel Source SSIS

I have an SSIS package with an Excel Source reads an Excel table. I currently am using the Table or View Data Access Mode and it is literally reading every row in the worksheet, 1,048,576 which is the maximum.
The source worksheet has an Excel table on it named PSA_DATA. Why isn't this table in the Table or View drop down? There is an option for the worksheet followed by _FilterDatabase but this fails when I run the package even though it pulls the correct data when I press Preview. Wouldn't this make more sense than using the SQL Command and SELECT * FROM [fact_PSA$Ax:Bx]? The whole reason we use Named Ranges and Tables in Excel is because they are dynamic! Now I have to hard code the range in every time with rows numbers?
What am I missing here? Is there an easier way I am missing? I just want to move an Excel table into a SQL table! Why don't doesn't the most ubiquitous piece of software in the world easily talk to the second most ubiquitous piece of software in the world!?!?!
If the sheet name is not shown in Table or view combobox, it is not a bad idea to use a Sql Command.
But When using SQL Comand to read from excel it is not necessary to specify a range, OLEDB will take used range by default just use the following command
SELECT * FROM [fact_PSA$]
Workaround
you can try reading your excel file from a script task or a script component, you can follow one of the following links to achieve this:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/2d45f180-9fd0-4224-a298-cb99e2b2100a/how-to-read-the-contents-of-excel-file-through-ssis-script-task-without-the-headers?forum=sqlintegrationservices
https://msdn.microsoft.com/en-us/library/ms403358.aspx
http://billfellows.blogspot.com/2013/04/ssis-excel-source-via-script.html
Side Note: there are many links you can follow to import data from excel to SQL using SSIS:
http://www.sqlshack.com/using-ssis-packages-import-ms-excel-data-database/
https://www.mssqltips.com/sqlservertip/2770/importing-data-from-excel-using-ssis--part-1/
https://www.simple-talk.com/sql/ssis/moving-data-from-excel-to-sql-server-10-steps-to-follow/
https://www.simple-talk.com/sql/ssis/importing-excel-data-into-sql-server-via-ssis-questions-you-were-too-shy-to-ask/
I appreciate the links to work-arounds, but I didn't really get an answer to my question. Why can't we reference an EXCEL TABLE (not a worksheet) from the SSIS Excel Source???
I ended up using the SQL Command data access mode with this query:
SELECT * FROM [fact_PSA$A:W]
WHERE fact_PSA_ID IS NOT NULL
Somehow, using SQL stopped it from reading every possible row in the worksheet even though the range provided is set for "A:W" which is every row. I guess the "WHERE fact_PSA_ID" limits the rows read before it hits the SSIS source.

Exporting to Excel leaves one empty row after title (but only if it exports one column!)

I have the following problem. I'm exporting to an Excel 2003 file (has to be Excel 2003) from SQL Server through SSIS. It first creates a sheet through a SQL Task and then populates it with a SQL Data Flow. The Excel connection specifies that the first row has the column names.
The problem I have is that when the sheet only has one column, SSIS starts writing not in row 2, but row 3.
This is the SQL script that creates the sheet:
CREATE TABLE `Sheet1` (`Column` LongText)
And the script that populates it:
SELECT socialSecNum FROM Users
If I add a dummy column, with name ".", and in the DataFlow fill it with blanks, it doesn't skip that row, and starts writing in row 2.
The SQL Task script that creates the sheet in this case is:
CREATE TABLE `Sheet1` (`Column` LongText, `.` LongText)
It's the same SQL script that fills the Excel file in both screenshots. The output doesn't change, so there isn't a NULL value being inserted randomly at the beginning there.
What is going on? How do I avoid it? I can't have that "." column name there.
EDIT: Also note that it's not that the Excel files are dirty and that's why it leaves an empty row in row 2 because it thinks it's being used; the same file doesn't skip a row if I add a second column in the script.
EDIT2: I was asked to remove the pictures, sorry.
I was finally able to replicate your issue of a blank row and your fix with the extra column. In the end I couldn't get back to not getting the blank row until I exported with the file actually open. Yeah that's right I got no blank row when the file was actually open in Excel while SSIS package wrote to it, which obviously is not a solution or a good one anyway.
CRAZY....
In all of the testing I did (a lot) I would say I got some inconsistent results using your SQL Task to create the table. If a worksheet with the same name already existed some times it would overwrite what was there but most of the time I would get new worksheet with an extra 1 on it. So when you are creating Sheet1 and it exists your table is created as Sheet11.... Because you are deleting the workbook all together you probably aren't seeing any of that weird behavior.
A quick search on the internet showed that this is a common issue to the 97-2003. So things you can do/try:
Switch to CSV but name the file with .xls extension, it will still open in Excel, have no formating etc. but user may get a warning when opening file.
Add the column then add another sql task to drop it after you populate it I wasn't successful with this but I don't do this in Excel so I may just not know a certain command.
Add another sql task to delete null rows, again I wasn't successful with this but I don't write queries against Excel very often.

Extracting excel files with SSIS

i am trying to create a ForEachLoop container that extracts excel files within a source folder.
i have created an execute sql task within a ForEachLoop container that stores my excel files full paths in an sql server table
and now i can't figure how to make it go through that list and extract each file into an ole db destination table
ps: the excel files have different types of data, columns change almost from one file to another (28 files)
can you please help me ? thank you in advance.
It won't work within a for each loop because your destination for each spreadsheet has to be a table that matches the columns coming in. If it was 25 different spreadsheets with the same columns types and number of columns you could insert all the rows into one table but it sounds like you need to create separate data flows for each one. You can then combine the datasource--> transform--> Ole Destination onto one data flow (which could run in parallel) and you would have (for 26 imports) three steps for each spreadsheet.

Resources