How to get information from Excel files using SSIS - sql-server

I am new to SSIS and am trying to understand how to do the following:
I have a folder (TestFolder) that has multiple folders within it (SubFolder1, SubFolder2, etc). In each subfolder there are multiple Excel files that have various names but will end in a date (Formatted as YYYYMM). In each Excel workbook there is a tab named: AccessRates and this is the data I want to store in the table in SQL Server.
Okay, so the question: How do I set up my SSIS Control flow to handle such a task? I have built a Data Flow Task that handles the Data conversion, error handling and ultimate placement in the server table, but I can not figure out the Control Flow. I believe I need a ForEach loop container, but I can't figure out how to set it, along with the variables up.
Any help or direction would be greatly appreciated!
JP

Solution guidelines
You should follow these steps:
Use a foreach loop and enumerate on files.
Set the top folder and select traverse subfolders.
Set the file sequence something like [the start of all files]*.xlsx
Retrieve fully qualified file name and map to a variable.
Inside foreach, drop a dataflow task
Make an Excel connection to any of the files
Go to the properties of the connection (F4).
Set an expression map connection string to the variable from step 4
Set Delay Validation to true.
Do your data flow.
This should be it.
Step-by-step tutorials
There are many articles that describe the whole process step-by-step, you can refer to them if you need more details:
How to read data from multiple Excel files with SQL Server Integration Services
Loop Through Excel Files in SSIS
Loop through Excel Files and Tables by Using a Foreach Loop Container

Have a look at this link it shows you how to set up an environment var to store the sheet name you want to get the data from and then how to use that to get the data from an excel source.
Hope this helps!

Related

Excel to SQL (SSIS) - Importing more then 1 file, ever file has more then 1 sheet and the data from excel starts from the 3rd row

Excel to SQL (SSIS) - Importing more then 1 file, ever file has more then 1 sheet and the data from excel starts from the 3rd row.
How would you build this the best way?
I know how to do each 1 separate but together I got into a pickle.
Please help me as I haven't found Videos or sites regarding this.
Just to clarify -
The tables (in excel) have the same design (each in different sheet).
Some excel files have 4 sheets some have only 3.
Many thanks,
Eyal
Assuming that all of the Excel files to be imported are located in the same folder, you will first create a For-Each loop in your control flow. Here you will create a user variable that will be assigned the full path and file name of the Excel file being read (you'll need to define the .xls or .xlsx extension in the loop in order to limit it to reading only Excel files). The following link shows how to set up the first part.
How to read data from multiple Excel files with SQL Server Integration Services
Within this loop you will then create a another For-Each loop that will loop through all of the Worksheets in that current Excel file being read. Apply the following link to perform that task of reading the rows and columns from each worksheet into the database table.
Use SSIS to import all of the worksheets from an Excel file
The outer loop will pick up the Excel file and the inner loop will read each worksheet, regardless of the number. They key is that the format of each worksheet must be the same. Also, using the Excel data flow task, you can define from which line of each worksheet to begin reading. The process will continue until all of the Excel files have been read.
For good tracking and auditing purposes, it is a good idea to include counters in the automated process to track the number of files and worksheets for each that were read. I also like to first import all of the records into staging tables where any issues and cleaning can be performed for efficiently using SQL before populating the results to the final production tables.
Hope this all helps.

Finding the column names from source assistant in SSIS

I am creating a SSIS package in which i have to move data from Excel to a table in SQL server. Excel file is like Source Assistant in data flow task.
Number columns in Excel file won't change but column names will change. So i have to find all the columns names in Excel file before inserting data.
Could you please help me on this?
Solution overview
Exclude column names in first row in excel connection, use sql command as data access mode
Alias column names in output column as matching your destination
Add a script task before the data flow task that import the data
You have to use the script task to open the excel file and get the Worksheet name and the header row
Build the Query and store it in a variable
in the second Data Flow task you have to use the query stored above as source (Note that you have to set Delay Validation property to true)
Detailed Solution
You can follow my answer at Importing excel files having variable headers it is solving a very similar case.

Moving files based on a source path found in a table using SSIS

I've chased my tail for a full 12 hours. Haven't found the right solution.
I'm locked into using SSIS. I have a SQL Server table with full paths and filenames already concatenated. Examples:
\\MydevServer1\C$\ABC\App_Data\Sample.pdf
\\MydevServer2\E$\Garth\App_Data\Morefiles.txt
\\MydevServer3\D$\Paths\App_Data\MySS.xlsx
etc.
I need to read each row of the table, get the path and filename and move that file to a new static destination directory.
The rows in the table will remain unchanged. I only use it as a source to locate the file to be moved.
I've tried:
1) Feeding a resultset from an ole db source to a recordset destination then to an Object variable that connects via variable to a foreach loop container holding a files system task. (Very problematic.)
2) Sending the table rows to a .csv file and reading each line of the csv file using a foreach loop container holding a file system task.
3) Reading directly from the table rows using a foreach loop container holding a file system task. (preferred).
and many other scenarios.
I have viewed a hundred examples online, but most of them involve loading a table, or sending results to flat files, or moving files from one folder to another based on extension type, etc. I haven't found anything on configuring a file system task to read a table supplied path and move the file based on the table value as the source.
I'm rambling. :-)
Any insight or help will be appreciated. I'm not new to SSIS, but I sure feel like it right now.
Create two string variables to store source and destination paths
Use an Execute SQL Task to populate a Full Recordset (Variable with Object data type)
Use For Loop container to go through each row of recordset and set those two variables.
Inside For Loop container, use File System Task. You need to specify IsSourcePathVariable = True, IsDestinationPathVariable = True, path variables - DestinationVariable / SourceVariable, and set operation (copy, move, etc.)
It appears I've been tail chasing due to the error, "Source is empty error".
This was caused by a blank first row in my recordset. I was searching for a fix to the Object variable is empty issue, when in reality the issue was that the Object variable couldn't find data right off the bat.
Insert shameful smug here.
Thanks to Anton for the help.

Use SSIS to import multiple .csv files that each have unique columns

I keep running into issues creating a SSIS project that does the following:
inspects folder for .csv files -> for each csv file -> insert into [db].[each .csv files' name]
each csv and corresponding table in the database have their own unique columns
i've tried the foreach loop found in many write ups but the issue comes down to the flat file connection. it seems to expect each csv file has the same columns as the file before it and errors out when not presented with this column names.
anyone aware of a work around for this?
Every flat file format would have to have it's own connection because the connection is what tells SSIS how to interpret the data set contained within the file. If it didn't exist it would be the same as telling SQL server you want data out of a database but not specifying a table or its columns.
I guess the thing you have to consider is how are you going to tell a data flow task what column in a source component is going to map to a destination component? Will it always be the same column name? Without a Connection Manager there is no way to map the columns unless you do it dynamically.
There are still a few ways you can do what you want and you just need to search around because I know there are answers on this subject.
You could create a Script Task and do the import in .Net
You could create a SQL Script Task and use BULK INSERT or OPENROWSET into a temporary stagging table and then use dynamic sql to map and import the final table.
Try to keep a mapping table with below columns
FileLocation
FileName
TableName
Add all the details in the table.
Create user variables for all the columns names & one for result set.
Read the data from table using Execute SQL task & keep it in single result set variable.
In For each loop container variable mappings map all the columns to user variables.
Create two Connection Managers one for Excel & other for csv file.
Pass CSV file connection string as #[User::FileLocation]+#[User::FileName]
Inside for each loop conatiner use bulk insert & assign the source & destination connections as well as table name as User::TableName parameter.
if you need any details please post i will try to help you if it is useful.
You could look into BiML Script, which dynamically creates and executes a package, based on available meta data.
I got 2 options for you here.
1) Scrip component, to dynamically create table structures in sql server.
2) With for each loop container, use EXECUTE SQL TASK with OPENROWSET clause.

How can I tell which file failed in a SQL Server SSIS MULTIFLATFILE connection when loading into a table from the files?

I am loading 30 files at a time using a MULTIFLATFILE connection from SSIS into a raw table. These files are constantly coming in and a job loops through to exec the package and pick these up. Some of the files can have bad data every once in a while. I am having trouble when one of the 30 files is bad. I have no way to know which of the files actually failed so I can move it into a Suspect folder, roll the other 29 files back into my Input folder and try again. Can't find any info on the net on this, or in Microsoft's documentation. Any help appreciated. Thanks.
I would rebuild your Connection using the FLATFILE type, then use it within a Foreach Loop Container that sets a Variable to the file name. In the Connection definition, you can use an Expression to set the ConnectionString property to your Variable value.
Then you can use that Variable value to take whatever action you need when your "bad" condition occurs.
Configure the FileNameColumnName property of the Flat File Source component. That will add a column containing the file name.

Resources