I'm working on an SSIS package that will be used to import data from an Excel file into SQL Server. My current struggle is figuring out how to make the SSIS package bring in exactly one excel file without knowing the name of it beforehand. I have a directory that will contain between 0 and n excel files at the same time. I want to pull in only the file with the oldest creation time. Is this possible?
I'm using Visual Studio 2015 to build the SSIS package. My DB is in SQL Server 2016.
To create a dynamic file connection:
Create a new Variable (Name Example: 'SourceFile') of datatype String.
In a 'For Each Loop Container' map that variable under the 'Variable Mapping' Tab and set the 'Enumerator Configuration' to the correct folder and file extension.
The 'For Each Loop Container' will read the file from the location and assign the name of the file to the variable.
In the Expressions Properties of your file connection set the ConnectionString property to #[User::SourceFile]
This should make your file source dynamic. It will pick up the file no matter what it is named, but the format of the file will have to be consistent.
Using just SSIS tasks, I am not aware of how to utilize the create date of the files to pick the oldest file, but if the file name contains the create date of the file you could substring the date out of #[User::SourceFile] variable and store it in another variable with each execution of the 'For Each Loop Container' to determine which file is oldest.
Related
I'm building a SSIS package (using VS 2017) to load a bunch of CSV files from a specific folder. It works great using the ForEach File loop. Data Flow task has a Flat File Source and OLE DB Destination. I want to be able to save the filename in the same table, along the data from the CSV file. How can I do this??
thanks a lot for your time and help
One other method if you want to load the entire file path of the flat files is
Right click on Flat File Source in Data Flow tab
Click "Show Advanced Editor"
Click "Component Properties"
Under Custom Properties you will find "FileNameColumnName".
If you give a name (e.g.: FlatFileName) then that will appear as one of the source output columns in mapping that contains the file name value(full path with file name). It can be mapped to any available column in destination or modified using Derived column task to get only file name.
In my development, I have stored mostly the entire path which helped me in better tracking.
One other method for anyone who comes across this question, instead of using ForEach loop you can also use a simpler method
Right click in Connection Manager
Click on "New Connection"
Select "MULTIFLATFILE" connection type and click Add.
In Connection manager editor, insert location and use wildcard * (E.g.: \\ABC\XYZ\file_*.txt) to pick all the flat files in that folder.
This automatically loops through all the flat files in that folder.
Hope this helps!
The ForEach File Enumerator can capture the file name and assign it to a variable, which can then be routed to the connection string variable of a connection manager for dynamic loading. This same methodology can be applied to write the file name to a database table.
In the data flow, add a Derived Column transformation and add a new column called 'FileName' (or whatever) and then set it's value to the variable value that the ForEach File Enumerator is setting for the file name.
I have a SQL table that stores filename and ssis package name. Whenever the file gets dropped to a directory, the corresponding ssis package gets triggered referring the mapping table.
If I store the file name as say, a*.csv in database and the corresponding ssis package as sample-ssis.dtsx, Will I be able to trigger the same package for any csv file starting with "a"? Can someone please help me with this.
Sure, you can read the file name into a variable and use a script task to loop through your mapping table and see if any of the filename-with-wildcard entries in the mapping table match the file name in the variable.
I'm little new to SSIS and I have a need to import some flat files into SQL tables in the same structure.
(Assume the table is already exist in the same structure and table name and flat file name is same)
I thought to create a generic package (sql 2014) to import all those file by looping through a folder.
I try to create a data flow task in a foreach loop container in the data flow task I dropped a flat file source and ADO.Net destination .
I have set the file source to a variable so that every time it loops through it get the new file. similarly for the ADO.net table name I set it to a the variable so that each time it select a different table according to the file name.
since both source column names and destination column names are same I assume it will map the columns automatically.
but with a simple map it didn't let me to run the package so added a column on the source and selected a table and mapped it.
when I run the package I assumed it will automatically re map everything.
but for the first file it ran but second file it failed complaining with map issues.
can some one let me know whether this is achievable by doing some dynamic mapping?? or using any other way.
any help would be much appreciated.
thanks
Ned
I am trying to merge a number of files. About 40,000 excel files all in exactly the same format (columns etc).
I have tried to run a merge command through CMD which has merged them together to a point but the CSV file it has merged to I am unable to open due to the size of it.
What I am trying to find out is what is the best process to merge such a large amount of files and then the process to load them into SQL server.
Is there any tools or something that may need to be customised and built?
I don't know a tool for that, but my first idea is this, assumed you are experienced with Transact SQL:
open a command shell, change to folder where your Excel files are stored in and enter the following command: dir *.xlsx /b > source.txt
This will create a textfile named "source.txt", which contains the names (and only the names) of all your Excel files
import this file in a SQL Server table, i.e. called "sourcefiles"
create a new stored procedure, which contains a cursor. The cursor should read your table "sourcefiles" in a loop row by row and store the name of the actually readed Excel file in a variable, i.e. called "#FileName"
in this loop perform a sql statement like this for every readed Excel file:
SELECT * INTO dbo.YourDatabaseTable
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0 Xml;HDR=YES;Database=#FileName',
'SELECT * FROM [YourWorkSheet$]')
let the cursor read the next row
Replace "YourDataseTable" and "YourWorkSheet" with your needs.
#FileName must contain the full path to the Excel files.
Maybe you have to download the Microsoft.ACE.OLEDB.12.0-Provider before executing the sql command.
Hope, this helps to think about your further steps
Michael
edit: have a look on this website for possible errors
How do I use SSIS to iterate the image files in a directory and using the filename run a query to insert the image into sql server?
I realise that with a Foreach File Enumerator I can loop the files and get the filename into a variable. How do I use this variable to run a query to find the record for that filename from hd in my table and then import the image into my sql server image type column?
Once I have the file in my database, I will delete the file from hd.
If I'm understanding the problem correctly, you would like to sweep all the files in some location into SQL Server using SSIS?
Data Flow Task
Your data flow task will be responsible for the actual import of files into the database. Your approach would be the same as outlined in Import varbinary data Pretty picture version at insert XML file in SQL via SSIS
Your source will be a Script Transformation Component operating as a source component. It's job will be to add all the file names into the Data Flow. Change the filter in the second link to *.png (or whatever your filter is) and it should work.
Use the Import Column Component on the generated file names. This will add the file pointer into the data flow so that it can get imported into the database. You will need to ensure your data type is DT_IMAGE. Even if you're using varbinary(max)/varchar(max)/nvarchar(max) it's all going to be DT_IMAGE within the context of the pipeline's metadata.
Route all of that data into your target table and you will have imported your file data.
File cleanup
At this point, you have imported all this data and now you want to remove the files from disk. Assuming you stored the file name in the database along with the image bits, I'd use an Execute SQL Task to retrieve the list of file names. Change the output type from None to Full Result Set and store that into a variable of type Object.
Connect a Foreach Enumerator to the output of the SQL Task and here you'll want to "shred" the results. Google that term and you'll find a variety of blog posts or previous SO questions on how to do this. The end result will be a file name will be pulled from the recordset object and assigned to a local variable.
Inside the Foreach Enumerator, use a File System Task and Delete the file which is referenced in the variable set from the Foreach Enumerator.